5 Reasons for Storing XML in a Database
September 28, 2010
Let me start this topic with a couple of disclaimers. First, using XML as a storage format is not always the right choice – there are still valid cases where it can make sense to convert XML to relational format. Second, there are more than five reasons for storing XML in a database. Here I am just listing five reasons –in no particular order- that seem to be quite common.
1. When business records are represented as XML when they are processed or transmitted by applications.
When applications have already decided to use XML as the data format of choice, e.g. in a service-oriented architecture (SOA), and are transmitting and/or processing business data in XML format anyway, then it is often a good idea to also store the business data in the same XML format. If the applications use XML anyway, it is typically beneficial to avoid the overhead of disassembling XML just for storage, and to avoid reconstructing the XML when applications need to read data.
2. When the data format (schema) changes over time.
Let’s face it: schema changes in a relational database can be a pain in the neck. Just imagine a 1-to-1 relationship that evolves to a 1-to-many relationship. For example, assume that customer records are no longer restricted to one telephone number but can contain multiple phone numbers. Following traditional relational database design principles, you would now have to normalize your existing tables, i.e. introduce a new table for phone numbers with a 1-to-many join relationship from the customer table. This is a massive change. Any SQL statements that retrieve phone numbers or have search conditions on phone numbers need to be changed. However, if the customer records are stored as XML documents in an XML column, then the table definition does not need to be changed to accommodate the evolution from 1-to-1 to a 1-to-many relationship. An XML element “phone” would simply be allowed to occur multiple times per document. An XML index on these phone elements would also remain unchanged. With XML, the impact of schema evolution is (not zero but) much lower than with a relational database schema. This enables applications to react to changing business requirements faster and at a lower cost.
3. When the data format (schema) is complex and highly variable.
In today’s world, business record are often very complex, typically more complex than 50 years ago. Think of financial trades, electronic medical records, insurance contracts and claims, or rich content such as office documents. The complexity and variability and the number of optional data fields in such business records can be staggering. Designing a relational schema for such complex objects can be very challenging and can lead to dozens or even hundreds of tables that are difficult to understand from an application perspective. In such cases, storing each business record as a separate XML document can often simplify both database design and application development.
4. When you need to optimize object-centric data access.
Continuing the previous thought, what happens when complex business objects (trades, medical records, insurance claims, etc.) are mapped to and stored in a complex and normalized relational database schema? Storing a single business object may require SQL inserts into many tables, and reading a single business object may require complex multi-way joins and/or a sequence of many SQL queries to read all pieces of an object. In such cases, experience has shown that inserting or retrieving a single XML document is significantly more efficient. For such object-centric data access, XML can outperform a relational solution.
5. When you need the best of both worlds.
The world is not black and white. In particular, many application scenarios are not either 100% XML or 100% relational. Often, some data is better represented in one format while other is better represented in another. This makes a hybrid database design very appealing, since it allows you to manage both XML documents and relational data in the same database and even in the same table. You can than query across XML documents and relational data at the same time, even in a single query. For example, you can keep the fixed and structured parts of a record in relational columns and the variable or semi-structured parts of the same record in an XML column.