Why IBM DB2 is ideal for Transactional XML
November 5, 2008
As you saw in my previous post, IBM DB2 has spectacular performance numbers when benchmarking with XML data. As a reminder, DB2 performed 48.5 million transactions from 200 concurrent users over a two hour period on a server 4 CPUs, where each CPU had 6 cores. Admittedly, this is a relatively powerful server. But then again, you would expect to need a powerful server to process 48.5 million transactions in two hours. Today’s post will explain one of the key reasons why DB2 offers such impressive performance when compared to other major database vendors.
Most database vendors perform a similar set of steps when working with XML data. They parse the XML data, translate the XML data into a stream of tokens, logically represent the stream of tokens in a tree-like structure, store the tree-like structure, create indexes, and then manipulate and query the data as necessary. Of course, different vendors implement these steps in different ways, leading to some vendors enjoying a performance advantage in certain regards. For instance, some vendors have a higher reliance on large indexes than others, slowing data inserts, updates, and deletes. These vendors also typically require greater amounts of storage because of the need for these large indexes. However, it is query performance that concerns most people, so let’s focus on that.
Working with XML data poses an additional set of challenges when compared to working with traditional relational data. Many operations with XML data involve navigating around the tree-like structure that represents the XML data. For example, a query may impose a predicate on the values of the City and Age XML tags that appear inside a Customer tag, while returning the value of the Name tag that appears inside the same Customer tag. When you break this query into the atomic operations that are performed on the physical data, the query will involve many navigations of the tree-like structure. The database must navigate to the Customer element and then to the City element within it, apply the predicate, navigate to the Age element for that customer, apply the predicate, and if the predicates are satisfied, navigate to the Name element for that customer. And the database must perform these operations for many, many rows. Appreciating the need to optimize navigations of the tree-like representation of the XML data is one of the keys to understanding why IBM DB2 performs so well.
DB2 implements the tree-like structure in a way that allows you to very quickly navigate around the tree. Because DB2 stores pointers to both parent and child nodes for each element, and because DB2 optimizes the speed with which those navigations are performed, DB2 quickly navigates around the tree-like structures that represent the XML data. So, it is a combination of the physical representation of the tree-like structure, together with special algorithms for navigating around trees, that helps IBM achieve such blistering speed. After all, if you can optimize the atomic operations that underpin this breed of database queries, it only follows that you will enjoy the benefits at query time.
I do not have direct knowledge of the internal workings of other major relational database vendors. My opinions regarding how IBM manages XML data when compared with other major relational database vendors are drawn from information that is freely available on the internet. The opinions expressed in this post are mine and mine alone. They do not necessarily represent the opinions of my employer, IBM.