XML Enhancements in DB2 10 for Linux, UNIX, and Windows

April 19, 2012

You might have noticed that DB2 10 for Linux, UNIX, and Windows was announced earlier this month, and it contains a bunch of great new features. One of the new features that I am particularly excited about is Time Travel, which enables bitemporal data managenment in DB2.

DB2 10 also includes a variety of useful XML enhancements. Let’s take a quick look at the most important ones here:

New XML Index Data Types: Integer, Decimal

As you know, when you create an index on an XML element or attribute you must specify the data type that DB2 should use to interpret and index the specifies nodes. The reason is that the usage of XML Schema, which might define data types, is optional. And even if an XML Schema was used, it might not define a data type for each and every element and attribute.

Before DB2 10, an XML index for numeric values is specified with the data type DOUBLE, so that any type of numeric values can be indexed. In DB2 10 you can also specify the data types INTEGER and DECIMAL to define a numeric XML index.

If you know that a certain element or attribute can only contain INTEGER (or DECIMAL) values, then you can improve query performance by creating the XML index with the most appropriate type. For example, assume that dept is a table that has an XML column deptdoc. Then you can define the following indexes in DB2 10:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/@id’ as sql integer;

create index idx2 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/salary’ as sql decimal(6,2);

You don’t need to change existing queries to use these new indexes!

XML Indexes for Case-Insensitive Search

DB2 10 allows you to define case-insensitive indexes and search very efficiently for string values regardless of their upper case or lower case spelling. For example, to enable case-insensitive search of employee names, you could create the following index:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/name/fn:upper-case(.)‘ as sql varchar(40);

This index stores all key values in upper-case. To find employees with names such as Peter Pan, PETER PAN,  Peter PAN, pEtEr pAn, etc. you could write the following query that can use the index above:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[name/fn:upper-case(.) = "PETER PAN"]‘);

Index support for Prefix Search

Previously, the following query with the fn:starts-with() function could not use an XML index to find all employees whose name starts with “Jo”:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:starts-with(name, "Jo")]‘);

In DB2 10, this query can benefit from existing XML indexes of type VARCHAR(n) that may exist on the <name> element.

XML Indexes for Existential (Structural) Predicates

An existential predicate, sometimes called structural predicate, is a predicate that checks the existence of an XML element or attribute in an XML document, regardless of its value.

For example, imagine you want to find all documents that contain a <comment> element for an employee. One of the following queries comes to mind:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[comment]‘);

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:exists(comment)]‘);

To support such queries, DB2 10 allows you to create an XML index with the fn:exists function, like this:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/fn:exists(comment)‘ as sql varchar(1);

This index must be defined as VARCHAR(1) and each index key represents true or false to indicate the existence of the <comment> element in a given document.

Friendly XML query behavior

DB2 10 also includes several enhancements that make XML queries behave in a more friendly manner. Let’s look at just one of several cases. For example, consider the following query that has a numeric predicate on the @id attribute:

select *
from dept
where xmlexists(‘$DEPTDOC/dept[@id = 15]‘);

What happens if this query encounters an XML document where the @id attribute has a non-numeric string value such as “WW Sales”?  In DB2 9.x this situation would cause a type error at run-time and the query would fail and abort with an error message. Although this behavior was compliant with the language standard, it is not very useful. If you look for the department with id = 15, any department with non-numeric id can never be a match. Therefore, DB2 10 simply evaluates the predicate to FALSE in this case, i.e. ignores the document with the non-numeric id, and continues processing the query.

Binary XML client/server communication

With DB2 10, Java applications can choose the format in which XML is transmitted between the application and the DB2 server. You can choose between traditional textual XML and a new binary XML format that is a compressed on-the-wire format for XML.

On the application side, the encoding and decoding of XML to and from the binary format is performed by the JDBC driver (JCC 4.9 or higher). The JDBC driver can convert the binary XML format to and from a text,  DOM, SAX, or StAX representation. The use of binary XML provides the most notable performance gain for Java applications that already consume or produce XML in SAX format.

Applications can exchange XML with the DB2 server (through inserts, queries, etc.) and completely avoid the serialization of XML to text, i.e. neither the client nor the server needs to deal with textual XML for the purpose of insert or retrieving XML. Eliminating the parsing and serialization of textual XML can enable significantly better end-to-end performance.

Other IBM products that support this binary XML communication format include DB2 10 for z/OS, the WebSphere Application Server XML Feature Pack, and the WebSphere Datapower appliance.

Query Performance Enhancements

DB2 10 also includes a number of performance improvements for common types of XML queries, including many types of queries that use the XMLTABLE function.

About these ads

6 Responses to “XML Enhancements in DB2 10 for Linux, UNIX, and Windows”

  1. Crina Says:

    Is there a reason of not supporting creation of index like:
    /tag1/tag2[@attrib='some_value']/tag3 ?


    • Hi Crina,

      that’s a valid question. There is no technical reason why such types of XML indexes could not be done. Allowing predicates in index definitions adds a fair amount of complexity to index maintenance and query compilation in DB2, and the question is whether this is worth the investment.

      Where your example shows [@attrib='some_value'], would you also want to allow multiple predicates such as [@a = 5 and b = 10], or even something like [@a = 5 and ../b/c/d < e/f], or even arbitrary XPath or XQuery expressions?

      So, let me ask in return: is there a strong reason why such indexes are required and why (for your example) an index on /tag1/tag2/tag3 would not be sufficient?

      Thanks,

      Matthias

      • Crina Says:

        Thanks for answer.
        There are standards that recommend XML formats as follow:

        or just text direct

        text2

        There can be 10-20 tag2, with different attributes. 2-3 are always used for fast querying. Those are worth indexing.
        Indexing /tag1/tag2/@attrib or /tag1/tah2/tag3 would increase disk space and not sufficient fast (it indexes more than needed).
        I believe above example is a frequent scenario. It may not need complex predicates, like those in your example.

        For now the solution is to have separate columns for those 2-3 values to index.

      • Crina Says:

        Hmm. The example didn’t post correctly. Reposting.

        <tag1>
        <tag 2 attrib=’val1′>
        <tag3>or just text direct</tag3>
        </tag2>
        <tag 2 attrib=’val2′>
        <tag3>text2</tag3>
        </tag2>

        </tag1>

  2. Roger Says:

    Hello Matthias,

    With these indexes on xml column, when i submit an SQL/XML with xmlexists query to utilize the index, does this update the LASTUSED column in syscat.indexes ?

    I don’t seem to see it updated although tests on huge data reduced time by 75%.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: