The OpenDocument web site is not longer accepting new posts. Information on this page is preserved for legacy purposes only. For current information on ODF, please see the OASIS OpenDocument Technical Committee.

Usage Schemas to Tame ODF and OpenXML Down-Conversions

by Rick Jelliffe, O'Reilly Articles

Kitchen-sink standards are developed by committees and have to cope with a wide variety of different applications. If someone's software does something, there has to be some element or attribute or value stuck in. Sometimes the backdoor of properties (open ended value lists) is used, so that the schema can be simplified at the expense of enumerating possible values. But schemas like DOCBOOK, TEI, ODF, and OpenXML are classic kitchen sinks. There is an objective way to detect them: check their 'Structured Document Complexity Metric' (online) and if it is over 300, you probably have a kitchen sink. I gave some metrics earlier in Comparing Office Document Formats. Now the trouble with kitchen-sink schemas is that any particular set of documents will only use a subset of the total possible features. So writing a complete converter that accepts any possible input from a kitchen-sink schema and outputing them to some more targetted document type is a completely wasteful process. YAGNI. But, and here's the rub, every so often, someone will in fact use one of these strange often, someone will in fact use one of the elements you didn't expect. One way to cope with this is the usage schema. This is a schema derived from sampling representative documents. When new documents come in, you first validate them against the usage schema, and if there is a problem, escalate it to the roject management to schema, and if there is a problem, escalate it to the roject management to discuss how to handle it. It is a sign that the data is not what they expected. There are some tools to generate XSD usage schemas, but you can also generate them using Schematron. The tool I use first generates all three-level Xpaths found in the document, then makes a Schematron schema that reports if any node was found that was not caught by these XPaths. Very straightforward, but effective. Another use for usage schemas is for software development. Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I