XML versus Relational Database Performance
August 22, 2010
I have been asked many times: “What is faster, XML or Relational?”. Of course, this question oversimplifies a complex issue, and so the only valid answer is “It depends!”. Sometimes people ask the same question in a slightly different way: “If I have a relational table and convert each row into a small XML document and store these documents in a separate table with 1 XML column, what’s the performance difference between the two tables (for inserts/updates/queries)?”. But, in most cases such a conversion is not recommended and this type of comparison is, again, too simplistic.
Let’s say you want store, index, and query 1 million addresses and each address has a first name, last name, street, city, state, and zip code. That’s a simple and fixed structure and it’s the same for all records. It’s a perfect fit for a relational database and can be stored in a single table. Relational databases have been optimized for decades to handle such fixed records very efficiently. However, if the application needs to convert the address data to XML format anyway, it can often be faster to store the data permanently in XML and avoid the repeated conversion from relational format to XML format in the application.
Now consider a scenario where the business objects of interest are a lot more complex and variable than simple addresses. For example, derivative trades in the financial industry are modeled in XML with an XML Schema called FpML (financial products markup language). It defines more than 7000 fields (many are optional) with hundreds of 1-to-many relationships between them. Designing a relational schema to represent such objects is very hard and leads to hundreds of tables. The process of inserting (normalizing) and reading (denormalizing) a single object into such a relational schema can easily be 10x or 100x slower than inserting and reading a corresponding XML document in a native XML column (e.g. in DB2 pureXML).
So, any performance comparison of XML versus relational depends heavily on which data you choose for the comparison, and what type of operations you measure.
A large DB2 customer recently compared XML to relational performance because their business objects are currently mapped to 12 relational tables. Their application executes at least 12 SQL statements to retrieve all the relational data that comprises one of the logical business objects. Then the application reassembles the original business object. An alternative is to store these business objects as XML so that each object is stored as a single document. Instead of 12 tables, only 1 table with 1 XML column is then needed. In multi-user tests for data retrieval the the company found that the XML-based solution allows them to retrieve objects with 55% higher throughput than the existing relational SQL-based solution. The reasons for the performance benefit include fewer JDBC interactions between application and database as well as fewer distinct pages that need to be read when one logical business object is represented as one XML document (and not scattered over 12 tables). These tests were later repeated and verified at an IBM lab.
Another important considertion in the question of XML versus relational is the data format in which the data is produced and consumed outside the database. If the data is produced and/or consumed in XML format anyway, it is often better to also store the data as XML in the database.
So, the question “What is faster, XML or Relational?” is somewhat like asking “What is faster, a truck or a ship?”, because XML and relational are meant for different purposes, and either one can outperform the other depending on which use case you look at. And there are also use cases (with high schema complexity and schema variability over time) that cannot reasonably be implemented in a relational data model. (If you need to go to Hawaii, the boat always beats the truck!).
The beauty of a hybrid database system such as DB2 is that you can use both native XML and relational capabilitues side by side, in a tightly integrated manner. Some data is better representated in relational tables, other data is better represented in XML, and you can manage both in the same database or even the same table, and with the same APIs and utilities.