XML-only Databases

May 13, 2008

Some database vendors offer XML-only databases. That is, these vendors offer native XML databases that do not support other types of data.

Such databases work well for isolated XML data. You can consider such databases if you do not expect your XML data to work with other information in your organization. However, if you choose this route, and later want to analyze or use the XML data with other related information, you may at best encounter a costly and troublesome integration effort. Also, keep in mind that there will be performance issues resulting from the data being in non-integrated databases.

Many people say that the recent high tech past was about business automation, and that the impending future is about business optimization. In other words, now that you have used technology to automate many aspects of your business, you will focus on optimizing these systems to gain competitive advantage. To do this, you will want to integrate and leverage all information assets in your environment and thus obtain maximum value from those assets. This is a long-winded way of saying that it is highly likely that you will need to harness all information in your environment–including both XML data and the existing data in other formats. In addition, this existing data is not, and probably never will be, stored in XML. If you use an XML-only database, integrating the XML data with other information assets will likely pose a significant problem.

On a parting note, consider the practicalities of working with multiple interfaces for different databases, multiple data management tools, different hardware and software systems, different maintenance schedules, as well as the lack of advanced data management features in XML-only databases, it all adds up to an unwieldy and expensive proposition.

When you take all this into consideration, is an XML-only database a wise choice?

The basic unit of storage in a native XML database is the XML data that is being stored. In other words, XML data is stored “as is” in a native XML database. What a native XML database does is straightforward. How it does this is not so straightforward. Each vendor implements its native XML database in a different way, with very different performance characteristics and capabilities.

When choosing a native XML database, you should carefully consider the implications of choosing one vendor’s database over another. The performance, scalability, and capabilities of the applications you are building will vary depending on your choice of database.

Depending on which forums you read, there are differing opinions regarding which vendor’s implementation offers the best technology. Many will argue that IBM offers the most “native” implementation, followed by Microsoft, and then followed by Oracle. The actual answer for you probably depends on the nature of the applications that you are building on top of the database. The data for each application and the query characteristics for each application are different. Also, an application may have specific requirements like XML schema flexibility. So keep these in mind as you evaluate the different options.

While I cannot authoritatively say that one implementation is better than another, I can say that IBM is confident that its DB2 pureXML implementation offers the best performance, as well as the richest set of XML capabilities. IBM has published the results of the TPoX benchmark for DB2 pureXML. Transaction Processing over XML (TPoX) is a benchmark for XML database systems that is an open source project on SourceForge. Interestingly, the other major database vendors have thus far not published TPoX results for their systems.