XML Storage in DB2: To inline or not to inline?

July 11, 2012

First, let’s revisit the concept of inlined XML storage and then discuss the pros and cons of inlining.

What is XML Inlining?

In short, inlining is an optional storage optimization in DB2 for “small” XML documents.

When you define a table with an XML column in DB2, such as CREATE TABLE mytable(id INTEGER, ….. , doc XML), the DB2 server creates three storage objects in the table space:

  • A data object (DAT), which holds the relational rows of the table
  • An index object (INX), which stores any indexes for the table
  • An XML storage object (XDA), which is the XML Storage Area and holds any XML documents

Optionally, you can assign these three objects to different table spaces but by default they all go into the same table space.

As illustrated in the following picture, the XML column in the data object doesn’t contain the actual XML document, but only references (descriptors) of where the documents can be found. The XML document trees are stored in the XDA object, and if a tree is larger than a single page then it is automatically cut into multiple regions. This way, large documents can span many pages, and that’s completely transparent to your application.

The region index -automatically defined and maintained by DB2- essentially remembers which regions belong to the same XML document for any given row in the data object. The regions index also enables very efficient access to any portion of an XML document. If only some part of a large document is required, e.g. to answer a query, DB2 does not necessarily need to bring all pages of the document into the buffer pool.

The XML documents that are small enough to fit onto a single page just have a single region, and multiple regions can be stored on the same page, if space is available.

As it turns out, there are very many applications that deal with small XML documents, often just 1KB to 20KB for most documents. The access to such small documents can be optimized by storing there document trees right in the DAT object, together with the rows that they belong to. This is called inlined XML storage and is illustrated in the following picture.

To enable inlining, define the XML column with an INLINE LENGTH that indicates that maximum size up to which you want documents to be inlined. For example:

CREATE TABLE mytable(id INTEGER, ….. , doc XML INLINE LENGTH 30000)

In this example, any XML documents whose parsed hierarchical representation is less than 30000 bytes will be inlined. Any documents that are larger than this are automatically stored in the XDA object as usual. The application sees no difference.

The specified INLINE LENGTH must be smaller than the page size of the DB2 table space minus the length of any other columns in the table.

Inlined documents are stored in the same tree format as documents that are not inlined. Just in a different location.

The Pros and Cons of Inlining

Performance measurements and plenty of experience with real XML applications have shown that inlining is almost always recommended, if a large percentage of your XML documents (for example, more than  70%) can be inlined. The benefits of inlining include the following:

  • Faster access to inlined documents – no redirection via the regions index
  • The regions index has no entries for inlined documents. If a large percentage of your documents are inlined then this reduces the space and maintenance cost associated with the regions index.
  • Better prefetching since inlined XML documents are prefetched as part of the row they belong to.

A key characteristic of inlining is that it drastically increases the row size on the data pages, and hence reduces the number of rows per data page. This can negatively impact the performance of queries that read only relational columns and do not access the XML column. For example, consider the following table and query:

CREATE TABLE myxml(c1 INT, c2 INT, c3 INT, doc XML INLINE LENGTH 30000);

SELECT SUM(c1 + c2 + c3)
FROM myxml;

This query reads the 3 integer columns from *all* rows in the table. Due to the inlined XML column, these rows are spread over a much larger number of pages than without inlining, so this query needs to fetch a lot more pages than without inlining. If you have many such queries, then inlining might not be the best choice for you.

How do I know whether a given document is inlined?

The DB2 function ADMIN_IS_INLINED can be applied to an XML column and it returns 1 if a document is inlined, and zero otherwise. This enables you to determine which and how many documents are inlined.

The DB2 function ADMIN_EST_INLINE_LENGTH can also be applied to an XML document in an XML column and returns the smallest required inline length for this document to be inlined, or -1 if the document is too large to be inlined.

About these ads

One Response to “XML Storage in DB2: To inline or not to inline?”


  1. I received a couple of good questions on XML inlining, and I will share the answers here:

    Q: If I alter an XML column to add the inlining option, what happens to existing documents in the column?

    A: Existing documents are not immediately inlined and remain in the XDA storage object. Inlining is applied to documents that are inserted, loaded, or updated *after* the column has been altered to enable inlining. To inline existing documents, i.e. to move them from the XDA object to DAT object after inlining has been enabled, you need to update the documents with themselves, such as:
    UPDATE mytable SET xmlcol = xmlcol ;

    Note that a bulk update of many XML documents can require a lot of log space. You might have to perform a series of smaller updates and commit frequently to avoid running out of log space. Afterwards you might want to reorg the table to reclaim the
    freed-up space in the XDA object. Reorg itself does not
    move XML data from the XDA object to the DAT object.

    - – -

    Q: Is there a sweet spot for the INLINE LENGTH?

    A: My recommendation is to set the INLINE LENGTH so that you can inline as many documents as possible. In many cases I would simply set the INLINE LENGTH close to the largest possible value for the given page size, minus the length of the other columns in the table.
    The main exception to this recommendation is if you might want to add additional columns to the table at a later point in time. Let’s say my page size is 32KB and my table has 10 columns with a total length of 2000 bytes plus an XML INLINE LENGTH of 30000 bytes. In this case, DB2 does not allow you to add another VARCHAR(1000) column to the table. Even though the inline length is not reserved on each page, a row that has 2000 bytes + 30000 bytes of XML + 1000 bytes varchar would not fit on a 32KB page. Hence, the VARCHAR(1000) column cannot be added.
    So, if you want to retain the flexibility to additional columns later, use a moderate inline length that still allows a large share of your documents to be inlined.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: