Native XML versus CLOB and Shredding
March 20, 2008
Matthias Nicola and Vitor Rodrigues wrote an excellent paper comparing the performance of IBM’s native XML storage (called pureXML) to non-native storage. It makes for very interesting reading. I will include some highlights here.
Traditionally, there were two approaches to storing XML data in a relational database: using a CLOB or shredding.
If XML data is stored in a character large object (CLOB) field, the data is typically inserted as unparsed text. This avoids XML parsing at insert time, however it requires XML parsing at query execution time. This leads to low search and extract performance. For instance, look at the following graph (lower numbers are better).
This table shows the results for five types of queries:
Select*, which is full document retrieval of all documents, no predicate
1Pred1Doc, which is full document retrieval of one document matching one predicate
5PredSome, which is full document retrieval of documents matching multiple predicates
PartialAll, which is partial retrieval of all documents
PartialSome, which is partial retrieval of all documents matching certain criteria
For most queries, native XML storage significantly outperforms CLOB storage. Only full document retrieval that ignores the XML structure quickly reads XML data from CLOB fields.
With shredding, the XML data is mapped to a relational structure (which is then stored in a relational database). Here you can see that XQuery (which is used for the native XML) outperforms SQL querying with XML conversion (which is used for shredding) for most types of query. However, because searching over relational data is faster, the query that retrieves part of the data across all records is faster for shredded data.
These findings show that, in most circumstances, native XML storage provides significant performance gains over CLOB storage and shredding. However, I show here only part of the story. For instance, I did not include performance information for the ingestion of data. I encourage you to read the full paper at A performance comparison of DB2 9 pureXML and CLOB or shredded XML storage.