Flexible Schemas: When to Persist Data in XML Instead of Relational

September 26, 2008

One of the great benefits of XML as a format for persisting data is that it is relatively easy to update the schema. XML was designed to be inherently flexible in nature. Adding, removing, and updating elements or attributes in the data and schema are relatively straighforward operations.

In the past, when persisting XML data, many organizations mapped XML data into relational tables and ended up pulling their hair out when later updating schemas. Increasingly, organizations are choosing to store this data natively in XML format.

Of course, you could store XML data in a Character Large OBject (CLOB) or a Binary Large OBject (BLOB) in a relational table and still enjoy the benefits of easier schema management. However, the overhead involved in retrieving data from a CLOB or a BLOB often makes such situations unworkable. For instance, to work with the XML data, you need to retrieve the entire CLOB or BLOB, and then you need to process the contents of the CLOB or BLOB with an XML parser, before being ready to work with the XML data. This is an awkward and inefficient architecture, incurring a significant overhead for each data read operation. Many organizations are turning to data management solutions like DB2 pureXML that allow them to natively store and process their XML data.

Here at IBM, we have come across several instances where organizations have embraced the flexibility of XML schemas and chosen to persist their data in a native XML format:

  • One of the world’s leading telecommunications companies recently overhauled its order entry systems. Designing an order entry system that caters for many thousands of products and services in a variety of geographies is a tremendous challenge, especially when designing a schema that can cater for current and future offerings. It was for this reason that the telecommunications giant decided to store their order data in XML format. Thanks to the flexible nature of XML schemas, they can cater for their existing complex needs, while minimizing the impact of later introducing innovative new products and services.
  • Taxation authorities are faced with taxation rules and taxation forms that change on a yearly basis. Therefore, their data schemas must change each year. Some years, the changes consist of relatively straightforward field additions. However, some years the changes consist of larger reorganizations, which are much more difficult to manage. For taxation authorities, being able to easily manage schema changes from year-to-year is a compelling reason to move to persisting data in an XML format. IBM is currently working with multiple taxation authorities around the world to improve their tax collection systems. If you want to read about one such taxation authority’s experiences, check out New York State tax agency uses pureXML to simplify filing of more than 2 million returns already.
  • A Japanese software company developed a system that stores and manages diverse information for education establishments. Because each educational instituation has different storage needs, and because those storage needs evolve over time, this company discovered that using the relational model was both cumbersome and expensive. You can read more about this company and see some great quotes about their switch from relational to XML at Software Research Associates Tohoku chooses IBM DB2 9 with pureXML for UniVision+EV system.
  • A leading Chinese energy and utilities corporation developed a flexible data analysis and reporting system that could handle data from extremely diverse facilities across its more than 100 constituent companies. Their approach is to store common information in relational format and to store information from diverse sources–that have different schemas–in XML format. By using such a hybrid relational/XML approach, they are able to take advantage of flexible XML schemas to easily accommodate additions, updates, and changes. For more information about this situation, see China Huadian Corporation chooses IBM DB2 9 with pureXML to integrate and analyze corporate property information.

As you can see, all these companies take advantage of the flexible nature of XML schemas and native XML storage to make systems easier to manage and update, often implementing solutions that were previously difficult or impossible to do.

About these ads

2 Responses to “Flexible Schemas: When to Persist Data in XML Instead of Relational”

  1. Brian Boberg Says:

    Conor:

    Thanks for the articles on companies or business situations that have implemented DB2’s XML features. I would appreciate more on this topic since the company I work for is about to upgrade to DB2 V9 on z/OS. The DBAs in the department where I work are quite apprehensive concerning implementing the XML data type in our databases. Management is looking for recommendations on how to utilize this new feature. Articles which provide more detail on the implementation and use of XML within DB2 would be most desirable.

    We currently have a few situations where we store XML in a VARCHAR column. Obviously, this lacks the ability to index on certain XML elements and requires reading the entire XML tree structure each time it is read.


  2. Hi Brian,

    I’ll follow up with you off-line. By the way, have you seen the DB2 pureXML Wiki. It has a lot of articles related to implementation. The list of articles may be a little overwhelming. A good place to start is with these getting started articles.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: