The OpenDocument XML.org web site is not longer accepting new posts. Information on this page is preserved for legacy purposes only. For current information on ODF, please see the OASIS OpenDocument Technical Committee.

Publishing Writer Documents on the Web

by Dmitri Popov, Linux.com

Although OpenOffice.org has an HTML/XHTML export feature, it is not up to the snuff when it comes to turning Writer documents into clean HTML files. Instead, this feature turns even the simplest Writer documents into HTML gobbledygook, and while it attempts to preserve the original formatting, the results are often far from perfect. Moreover, publishing static HTML pages is so '90s: today, blogs and wikis rule the Web.

So what options do you have if you want to convert your Writer documents into tidy HTML pages or wiki-formatted text files? Quite a few, actually.

Let's start with the simplest scenario, where you need to convert a single Writer document into an HTML page. One way to do this is to use a pair of scripts: odt2txt.py and markdown.py. The first script converts the Writer document into a plain text file and turns the text formatting into markdown markup.  You can then convert the resulting text file into HTML using the markdown.py script...

Using the odt2txt script for intermediary conversion has another advantage. Many blog, wiki, and content management systems support the markdown syntax either directly or via optional plugins. This means that you can easily publish the marked down file on your wiki or blog. For example, if you are using DokuWiki, you can make it recognize markdown by installing the markdown plugin. By default, some assembly is required...

Speaking of wikis, you can also convert the HTML file into a "native" wiki page using the excellent HTML::WikiConverter service. It supports all major wiki formatting dialects, and it's available as a standalone Perl script, which you can install and use on your own machine. It's not all sunshine and unicorns, though, and the odt2txt script does have its limitations. The current version of the script supports the following formatting: italics, bold italics, ordered and unordered lists, block quotes, code blocks, hyperlinks, and footnotes. The two major elements that are not recognized by the script are tables and images. If you want to publish the contents of a Writer document as a post on your blog, you can easily do so by using the functionality provided by Google Docs... These approaches are not as straightforward as clicking an Export button, but if you want to generate tidy HTML files out of your Writer documents or publish them on your blog or wiki, you should give these techniques a try.

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I