Dec 12, 2007

To extract text/body only from html file without HTML tags

We can use Lynx with the --dump option, like this:

lynx --dump myfile.html > myfile.txt

OR

lynx --dump http://mysite.com/index.html > myfile.txt

No comments: