When to Store in XML Format
August 4, 2008
XML is unchallenged as a data exchange format. Most industries have developed standards that use XML as the data format. The list of standards is seemingly endless: ACORD, FIXML, FpML, MISMO, HL7, NIEM, OTA, SVG, TAX1120, UBL, XBRL, and so on.
Your immediate inclination is likely to store such information in its native XML format. After all, when the information is exchanged, it is expressed in XML format. Therefore it seems natural that it would also be stored in a native XML format. Often this is the case. However, it is not always the case. Sometimes it makes more sense to store the information in relational form. This blog post discusses when to use the two approaches.
The key question that you need to ask yourself is whether the information is best expressed in a relational format or in an XML format. Because XML is so popular as a platform-neutral and transparent data exchange format, data is often sent as XML even when XML is not the ideal format for the information. For instance, information is sometimes force-fitted into a hierarchical data structure even when it has no hierarchical characteristics. If this is the case, examine the data to see if it should in fact be stored in a relational format. The general rule is that you should consider a relational structure when the data is better described in tabular format. This is the characteristic that most readily defines the relational schema.
There are also situations when you may be compelled to use a relational schema. For example, if the application that consume this information work only with relational data, then you need to take this into consideration.
You should store information in XML format when:
- The information is best represented in a hierarchal format.
When you store hierarchical information in a relational database, you often require a large number of tables. Creating and maintaining such systems is complex and costly. Multiple joins result in complicated SQL statements. In such cases, you should instead use native XML storage.
- The schema is subject to change.
Because relational tables are highly-structured, changing the schema is both difficult and costly. Updating a schema for natively stored XML, on the other hand, is a relatively painless exercise. If you expect the schema to change, you should use native XML storage.
- There are many null values in the relational tables.
Relational database tables set aside storage for each column. If your information does not populate all of these columns, you can have a large number of null values in your database. This, in turn, affects both storage requirements and performance. Again, you should consider native XML storage in such situations.