The latest issue of IBM Database Magazine has an interesting article titled Healthcare’s XML Heartbeat. In this article, Ken North describes the rise and rise of XML in the healthcare industry. He talks about the key role that XML is playing in the emergence of electronic medical records, the efficient exchange of information, and increasing levels of interoperability. The article gives great insight into the XML-based electronic medical records environment at UCLA Health System.

Why won’t Oracle publish results for the Transaction Processing over XML (TPoX) benchmark?

We know that Oracle has implemented TPoX demonstration and test systems. Oracle has demonstrated TPoX systems at their conferences. Also, Oracle has included TPoX tests and data in their research efforts and as part of their X-Files demonstration. So we know that Oracle has used TPoX. Why won’t they publish benchmark results?

Oracle claims that the TPoX benchmark is narrowly scoped and that it doesn’t handle the diverse use cases of XML. They are correct in that TPoX does not model multiple scenarios. It models only one scenario… a security trading scenario that uses a real-world XML Schema (FIXML). Such a scenario involves a high volume of relatively small XML documents. The benchmark takes into account write, update, delete, indexing, XML schema, logging, concurrency, and other database considerations. While the TPoX benchmark does indeed model only one scenario, it makes sure to incorporate a real-world mix of XML-related operations for that scenario.

Database benchmarks are always focused on a specific usage scenario, and TPoX is no exception. Relational database benchmarks have always taken the same approach: TPC-C focuses on OLTP systems, TPC-W on web-based transaction systems, TPC-H on ad-hoc decision support systems, TPC-R on decision support systems with precomputed and materialized views. There are database benchmarks that focus on SAP workloads, and so on. The reason for this approach is that combining all these diverse use cases into a single benchmark would lead to a test scenario that does not represent anything in the real world. In the same spirit, TPoX focuses on just one of various common XML use cases. Other XML benchmarks that focus on other use cases, such as XML content and full-text search, are also desirable but yet to be defined.

TPoX is entirely open-source (with major contributions from Intel and IBM). In TPoX 1.3 contributors from the University of Furtwangen in Germany have added initial support for Oracle Database and Microsoft SQL Server. In particular, they adjusted the TPoX queries to support Oracle Database and SQL Server syntax, and they have extended the TPoX workload driver so it connects to Oracle Database and Microsoft SQL Server. Anybody, including Oracle, is welcome to enhance, revise, or modify the TPoX benchmark as they deem appropriate for meaningful benchmarking.

The TPoX benchmark is a useful measuring stick for the many organizations who have transactional systems with small XML documents. I am amused that Oracle, on the one hand continually highlights the need for separately handling the diverse XML uses cases, and then on the other hand complains that TPoX handles only one use case and not a diverse range of use cases. Don’t they realize that they are contradicting themselves 🙂

Oracle also claims that TPoX attempts to follow the Transaction Processing Performance Council (TPC) approach, and that the TPC approach deviates from production system workloads. It is true that many people, including myself, consider some of the TPC benchmarks to have flaws. However, they still serve a purpose for people who are evaluating database options. Although the benchmarks are not a direct indication of a performance in an end user’s environment, they are still a useful tool for indicating relative performance.

I am not aware of any any alternative XML benchmarks proposed by Oracle. If Oracle has an XML benchmark that they believe is better, it would be great for everyone in the industry if they would bring it forward. Everyone would benefit from such a move, especially the people who are trying to evaluate their XML storage options. I am curious to know why Oracle hasn’t published any benchmark results for its XML capabilities, and instead focuses its efforts on debates that are difficult to resolve. IBM has published both TPoX benchmarks results and internal benchmark results. When are Oracle going to step up to the plate?

Oracle has long claimed that the fact that Oracle Database has multiple different ways to store XML data is an advantage. At last count, I think they have something like seven different options:

  • Unstructured
  • XML-Object-Relational, where you store repeating elements in CLOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as LOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as nested tables
  • XML-Object-Relational, where you store repeating elements in VARRAY as XMLType pointers to BLOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as XMLType pointers to nested tables
  • XML-Binary

Their argument is that XML has diverse use cases and you need different storage methods to handle those diverse use cases. I don’t know about you, but I find this list to be a little bewildering. How do you decide among the options? And what happens if you change your mind and want to change storage method?

But back to my original question… Why don’t Oracle publish results for the TPoX benchmark? Perhaps it is because Oracle are still trying to figure out which of their seven storage options is best to use 🙂