A true story about using XML forms and DB2 to process millions of tax returns
June 24, 2011
A short summary of why and how the New York State Department of Taxation & Finance is using DB2 pureXML is available here:
Many of the reasons why XML is useful for forms processing also apply to other forms-based application, e.g. in healthcare, insurance, government, and other industries. Most cases of forms-processing share the following characteristics:
- Many different types of forms are being used, each one having a different set of fields. (This is schema diversity!)
- Some forms tend to change over time, usually to support new or changed business processes or regulatory requirements. Simply put, fields on the forms get changed, added, or removed. (This is schema evolution!)
- When information is filled into any given form, there are typically many optional fields that can -and often will- remain blank. (This is sparsely populated data!)
As it turns out, XML is very well suited to handle schema diversity, schema evolution, and sparse data. Optional XML elements and attributes that may or may not appear in a given document are convenient to handle sparse data, whereas in a relational model sparse data often leads to large numbers of NULL values.
XML also supports schema flexibility, e.g. documents for different schemas (or different versions of the same schema) can be stored, indexed, and queried in a single XML column.
And finally, if most of the application operations touch one form at a time, e.g. insert a form, retrieve a form, update a form, validate a form, and so on, then an XML solution where each form is a single XML object typically also provides significant performance benefits over a normalized relational schema for the same information.

