Native XML versus CLOB and Shredding

March 20, 2008

Matthias Nicola and Vitor Rodrigues wrote an excellent paper comparing the performance of IBM’s native XML storage (called pureXML) to non-native storage. It makes for very interesting reading. I will include some highlights here.

Traditionally, there were two approaches to storing XML data in a relational database: using a CLOB or shredding.

If XML data is stored in a character large object (CLOB) field, the data is typically inserted as unparsed text. This avoids XML parsing at insert time, however it requires XML parsing at query execution time. This leads to low search and extract performance. For instance, look at the following graph (lower numbers are better).

This table shows the results for five types of queries:

  • Select*, which is full document retrieval of all documents, no predicate
  • 1Pred1Doc, which is full document retrieval of one document matching one predicate
  • 5PredSome, which is full document retrieval of documents matching multiple predicates
  • PartialAll, which is partial retrieval of all documents
  • PartialSome, which is partial retrieval of all documents matching certain criteria

For most queries, native XML storage significantly outperforms CLOB storage. Only full document retrieval that ignores the XML structure quickly reads XML data from CLOB fields.

With shredding, the XML data is mapped to a relational structure (which is then stored in a relational database). Here you can see that XQuery (which is used for the native XML) outperforms SQL querying with XML conversion (which is used for shredding) for most types of query. However, because searching over relational data is faster, the query that retrieves part of the data across all records is faster for shredded data.

These findings show that, in most circumstances, native XML storage provides significant performance gains over CLOB storage and shredding. However, I show here only part of the story. For instance, I did not include performance information for the ingestion of data. I encourage you to read the full paper at A performance comparison of DB2 9 pureXML and CLOB or shredded XML storage.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: