Joe Wein
Fighting spam and scams
on the Internet

Home / Blog / About us
Spam
419/Nigeria
Online fraud
jwSpamSpy
Contact

Email Spam Filter:
jwSpamSpy
Try it for free!

Google
 

Converting Word files to HTML files

Problem: Microsoft Word 97 and 2000 let you save Word .doc files as HTML. However, the result is very bulky, near impossible to read and not very compatible with non-Microsoft products.

Explanation:
HTML files created by Microsoft Word carry a lot of extra information to make it possible to re-import the file into Word and save is as a .doc without losing certain features. The standard Save As dialog in Word is not primarily designed to create documents to upload to a web server for anyone to browse. There are more appropriate tools for that.

Solution: Microsoft supplies a DLL for Word 2000 that lets you use File / Export To / Compact HTML to save a .doc to a compact .html file optimzed for web publishing and not meant for re-imporing into other office products. In addition, there's an excellent freeware called tidy that verifies and corrects HTML files. Download it and run it on the output of File / Export To / Compact HTML. Make sure you specify the --word-2000=yes option, for example:

tidy --word-2000 yes -m inoutfile.htm

Further reading:
http://office.microsoft.com/downloads/2000/Msohtmf2.aspx
HTML Tidy homepage at sourceforge.org