Flexible Schemas: When to Persist Data in XML Instead of Relational
September 26, 2008
One of the great benefits of XML as a format for persisting data is that it is relatively easy to update the schema. XML was designed to be inherently flexible in nature. Adding, removing, and updating elements or attributes in the data and schema are relatively straighforward operations.
In the past, when persisting XML data, many organizations mapped XML data into relational tables and ended up pulling their hair out when later updating schemas. Increasingly, organizations are choosing to store this data natively in XML format.
Of course, you could store XML data in a Character Large OBject (CLOB) or a Binary Large OBject (BLOB) in a relational table and still enjoy the benefits of easier schema management. However, the overhead involved in retrieving data from a CLOB or a BLOB often makes such situations unworkable. For instance, to work with the XML data, you need to retrieve the entire CLOB or BLOB, and then you need to process the contents of the CLOB or BLOB with an XML parser, before being ready to work with the XML data. This is an awkward and inefficient architecture, incurring a significant overhead for each data read operation. Many organizations are turning to data management solutions like DB2 pureXML that allow them to natively store and process their XML data.
Here at IBM, we have come across several instances where organizations have embraced the flexibility of XML schemas and chosen to persist their data in a native XML format:
- One of the world’s leading telecommunications companies recently overhauled its order entry systems. Designing an order entry system that caters for many thousands of products and services in a variety of geographies is a tremendous challenge, especially when designing a schema that can cater for current and future offerings. It was for this reason that the telecommunications giant decided to store their order data in XML format. Thanks to the flexible nature of XML schemas, they can cater for their existing complex needs, while minimizing the impact of later introducing innovative new products and services.
- Taxation authorities are faced with taxation rules and taxation forms that change on a yearly basis. Therefore, their data schemas must change each year. Some years, the changes consist of relatively straightforward field additions. However, some years the changes consist of larger reorganizations, which are much more difficult to manage. For taxation authorities, being able to easily manage schema changes from year-to-year is a compelling reason to move to persisting data in an XML format. IBM is currently working with multiple taxation authorities around the world to improve their tax collection systems. If you want to read about one such taxation authority’s experiences, check out New York State tax agency uses pureXML to simplify filing of more than 2 million returns already.
- A Japanese software company developed a system that stores and manages diverse information for education establishments. Because each educational instituation has different storage needs, and because those storage needs evolve over time, this company discovered that using the relational model was both cumbersome and expensive. You can read more about this company and see some great quotes about their switch from relational to XML at Software Research Associates Tohoku chooses IBM DB2 9 with pureXML for UniVision+EV system.
- A leading Chinese energy and utilities corporation developed a flexible data analysis and reporting system that could handle data from extremely diverse facilities across its more than 100 constituent companies. Their approach is to store common information in relational format and to store information from diverse sources–that have different schemas–in XML format. By using such a hybrid relational/XML approach, they are able to take advantage of flexible XML schemas to easily accommodate additions, updates, and changes. For more information about this situation, see China Huadian Corporation chooses IBM DB2 9 with pureXML to integrate and analyze corporate property information.
As you can see, all these companies take advantage of the flexible nature of XML schemas and native XML storage to make systems easier to manage and update, often implementing solutions that were previously difficult or impossible to do.