All Native XML Databases are not Equal
May 12, 2008
The basic unit of storage in a native XML database is the XML data that is being stored. In other words, XML data is stored “as is” in a native XML database. What a native XML database does is straightforward. How it does this is not so straightforward. Each vendor implements its native XML database in a different way, with very different performance characteristics and capabilities.
When choosing a native XML database, you should carefully consider the implications of choosing one vendor’s database over another. The performance, scalability, and capabilities of the applications you are building will vary depending on your choice of database.
Depending on which forums you read, there are differing opinions regarding which vendor’s implementation offers the best technology. Many will argue that IBM offers the most “native” implementation, followed by Microsoft, and then followed by Oracle. The actual answer for you probably depends on the nature of the applications that you are building on top of the database. The data for each application and the query characteristics for each application are different. Also, an application may have specific requirements like XML schema flexibility. So keep these in mind as you evaluate the different options.
While I cannot authoritatively say that one implementation is better than another, I can say that IBM is confident that its DB2 pureXML implementation offers the best performance, as well as the richest set of XML capabilities. IBM has published the results of the TPoX benchmark for DB2 pureXML. Transaction Processing over XML (TPoX) is a benchmark for XML database systems that is an open source project on SourceForge. Interestingly, the other major database vendors have thus far not published TPoX results for their systems.