XML Manipulation in Application code vs. a Native XML Database
January 8, 2011
XML processing is at the core of many middleware systems, often to route, split, or modify XML messages, or to extract values from them. Frequently, such XML processing is implemented with XSLT stylesheets or in application code with DOM or SAX parsers.
But, when XML messages are stored in a database system, application designers have a choice whether to perform XML manipulation in the application code or in the database server with XQuery or SQL/XML.
When a native XML database stores XML in a parsed format, such as in DB2 pureXML, then most XQuery and SQL/XML operations can be performed without additional XML parsing. This can be a significant performance benefit of processing XML in the database layer.
Coding XML manipulation declaratively in XQuery or SQL/XML is typically shorter than writing procedural application code that uses DOM or SAX. Hence, there is also a difference in development cost.
We have performed some tests to compare the performance and development cost of using SQL/XML or XQuery in DB2 as opposed to using DOM or SAX in a Java application. We started by defining several common XML processing uses cases:
- Use Case 1.1 (“Extract5”): Extract a small number of values from randomly selected XML documents (approx5% of the element values)
- Use Case 1.2 (“Extract50”): Extract a medium number of values (50%).
- Use Case 1.3 (“Extract100”): Extract most values (almost 100%).
- Use Case 2.1 (“Split500”): Split a large XML document containing 500 concatenated XML fragments into individual XML documents. Concatenated XML documents are often used for download or FTP, but are not useful for individual processing.
- Use Case 2.2 (“Split2500”): Split a document containing 2500 fragments.
- Use Case 2.3 (“Split5000”): Split a document containing 5000 fragments.
- Use Case 3 (“Modify”): Insert, change, and delete XML elements in each randomly selected XML document.
We have implemented these uses case in a Java application in two ways:
(a) using SQL/XML and XQuery to push the XML manipulation to a DB2 9.7 database, and
(b) using a DOM and/or SAX parser in the Java code itself.
Both options read or write XML to/from a DB2 database, but the location and implementation of the XML manipulation differs. In option (a), the XML manipulation happens in the database as part of the database read or write operations. Option (b) reads and writes full documents from/to the database, but XML specific operations happen exclusively in the Java code. We measured all tests on the same hardware to ensure an apples-to-apples comparison.
Let’s look at some of the results…
The following chart shows the throughput of the three “Extract” use cases in transactions per second. The SQL/XML-based value extraction in the database provides notably higher throughput than SAX parsing in Java, especially when the number of extracted values is small.
We observed similar performance benefits for performing “Split” operations (use cases 2.1, 2.2, and 2.3) in the database.
Modifications to XML document are often implemented with XSLT or a DOM parser in application code. However, we find that inserting, updating, or deleting XML elements and attributes in a document can be significantly faster when XQuery Update expressions are run inside the database. The next chart shows the throughput for a mix of common XML Update operations for 1, 5, 10, and 15 concurrent users:
Even more important than the performance benefit of in-database XML manipulation might be the reduction in development cost.
We counted the lines of code for both of our implementations, including the Java code to prepare and submit SQL/XML or XQuery statements and to bind extracted values to Java objects. The following chart shows that application-level XML processing can require 10x to 13x as many lines of code than database-level XML manipulation.
This is very significant since the number of lines of code is a common metric to estimate the software development cost, number of coding errors, and software maintenance cost over time. Although some (but not nearly all!) of the development labor can be reduced by using XML mapping frameworks (such as JAXB) XQuery and SQL/XML are still much shorter and typically less costly to develop and maintain.
For more details on this topic, see: