The XML Extender was introduced in DB2 7.1 (yes, that long ago!) to enable applications to store, query, update, construct, and validate XML documents in DB2 for Linux, UNIX, Windows and DB2 for z/OS.

In 2006, DB2 9.1 introduced pureXML – a set of native XML features that are deeply integrated into the DB2 engine. DB2 pureXML is vastly superior over the XML Extender, in both functionality and performance.

The XML Extender is still available in DB2 9.1 and 9.5 but has been discontinued in DB2 9.7. Hence, if you are using the XML Extender and want to move to DB2 9.7 or DB2 10, you need to migrate your XML application to DB2 pureXML.

Although the migration from the XML Extender to pureXML requires some considerations, it is not very hard. The migration to pureXML yields dramatic performance benefits and often simplifies your solution. You will need to migrate eventually, and the sooner you do it the earlier you will reap the benefits of significantly better performance and greater flexibility.

The details of the migration depend very much on which parts of the XML Extender functionality you are using, such as:

  1. Shredding XML to relational tables (called “XML collection” in XML Extender lingo)
  2. Construction of XML documents from relational tables
  3. Storage of XML documents in db2xml.xmlclob columns
    • with or without side tables?
    • extraction of values with extract UDFs?
    • XML updates with the update UDF?
  4. XML validations against DTDs or XML Schemas
  5. Other XML Extender features such as XSLT transformations or interaction with Websphere MQ message queues.

Let’s briefly discuss these one by one:

1. Shredding XML to relational tables

If you are using the XML Extender to shred XML into a set of relational tables, e.g. with the db2xml.dxxInsertXML stored procedure, you have two options:

(a) Consider whether you can avoid the shredding and instead store and query your XML documents in a column of type XML. Inserting documents into an XML column is significantly faster than shredding, and you can build indexes and express XML queries on XML columns.

(b) DB2 pureXML still offers shredding capabilities: you can use the XMLTABLE function or annotated XML Schemas to convert XML to relational format. These options are typically 5x to 8x faster than the old XML Extender shredding and still make the data available in same relational tables as before so that consuming applications don’t need any changes.

For more details on shredding in DB2 9.1 or higher, see the free sample Chapter of the DB2 pureXML Cookbook. That chapter also discusses the pros and cons of XML columns vs shredding, which is also examined in a recent developerWorks article.

If you choose to continue to shred XML into relational tables, note that shredding with the XMLTABLE function can sometimes be simpler and more flexible than using annotated XML Schemas. However, if you have a large number of XML Extender DADs, you might benefit from a free tool that converts DADs to annotated XML Schemas. This tool is available for download at the bottom of the article “From DAD to annotated XML schema decomposition“.

2. Constructing XML from relational tables

In DB2 9.1 and higher, construction of XML documents from relational tables is performed with SQL/XML construction functions that are part of the SQL standard. These functions include:

As part the migration you would write SQL statements (or stored procedures, if you prefer) that read relational data and produce XML with these functions.

3. Storage of XML documents in db2xml.xmlclob columns

If are using the XML Extender to store XML in db2xml.xmlclob or db2xml.xmlvarchar columns, you will migrate these columns to native XML columns. A previous blog post describes how to move data from a CLOB column to an XML column.

If you have used so-called side tables with your db2xml.xmlclob columns, you should replace those with native XML indexes. Note that DB2 10 contains some great XML index enhancements. Using XML indexes instead of side tables is simpler, more flexible, and provides better performance.

The XML Extender also offered a set of EXTRACT functions that enable applications to use XPath to extract individual values from XML documents in a db2xml.xmlclob column. In the migration you will replace these EXTRACT functions in your queries with the SQL/XML functions XMLQUERY, XMLEXISTS, or XMLTABLE, which are faster and more powerful. For details, see the articles “Which way to query your XML data?” and “XMLTABLE By Example“.

If you are using the XML Extender UPDATE function to modify XML in db2xml.xmlclob columns, you will migrate them to native XML updates.

4. XML validations against DTDs or XML Schemas

If you are using DTDs to validate your XML documents, you can convert the DTDs to an equivalent XML Schema and use the XML Schema capabilities that are part of DB2 pureXML. That is, you can register XML Schemas in DB2’s schema repository and validate XML documents during LOAD, insert, update, select, or other operations.

5. Other XML Extender features such as XSLT transformations or interaction with MQ

If you are using XSLT style sheets with the XML Extender, you can continue using the same style sheets with the XSLTRANSFORM function.This function also allows you to pass parameters to a style sheet at runtime. Also, the DB2 functions to read and write XML documents from/to Websphere MQ message queues are still available as part of the DB2 Federation capabilities that provide connectivity between DB2 and MQ.

These are the main topics that need consideration for moving from the XML Extender to DB2 pureXML. Additional information is available in the DB2 Information Center here and here.


If you want to learn more about XML indexes in DB2, including the XML index enhancements in DB2 10, join the DB2 pureXML devotee conference call on Wed, June 13, at 10am US Eastern Time. For further details and dial-in numbers, see:


First, what is a CDATA section in XML? CDATA stands for character data, and a CDATA section in an XML document defines a piece of text that should not be parsed by an XML parser.

Normally, an XML parser examines every character in an XML document, even the characters that make up the text value of an XML element. The reason for this behavior is that one element could contain other elements, and so the parser must look for the “<“-character that would denote the beginning of another element.

However, if you want the parser to ignore certain portions of text then you can enclose this text in a CDATA section. A CDATA section starts with <![CDATA[  and ends with ]]>.

A CDATA section can be useful if the text value of an XML element contains any reserved characters. For example, the characters “<” and “&” have a special meaning in XML and must not appear in the text value of an element. (The “&” marks the beginning of an entity reference in XML.) But, if text is declared as a CDATA section then its characters are ignored, including any illegal characters.

Let’s look at an example where I’m inserting the string “Food & Wine” with the reserved character “&” in it.

The first insert fails because the “&” is not properly escaped and not included in a CDATA section. The other 4 inserts are successful. In these 4 cases the CDATA section spans different portions of the string but always includes the “&”:

create table cdata(id int, doc XML);

insert into cdata
values(0, '<doc><name>Food & Wine</name></doc>');

insert into cdata
values(1, '<doc><name><![CDATA[Food & Wine]]></name></doc>');

insert into cdata
values(2, '<doc><name><![CDATA[Food &]]> Wine</name></doc>');

insert into cdata
values(3, '<doc><name>Food <![CDATA[& Wine]]></name></doc>');

insert into cdata
values(4, '<doc><name>Fo<![CDATA[od & Wi]]>ne</name></doc>');

When you insert and later retrieve an XML document in DB2, the CDATA sections are not preserved. Instead, any offending characters in the CDATA section are escaped according to the XML standard. Upon insert the CDATA sections are removed, and upon retrieval the “&” is escaped as &amp; as it should be. Similary, a “<” sign in the CDATA section would be escaped to &lt; .

select * from cdata;

1      <doc><name>Food &amp; Wine</name></doc>
2      <doc><name>Food &amp; Wine</name></doc>
3      <doc><name>Food &amp; Wine</name></doc>
4      <doc><name>Food &amp; Wine</name></doc>

4 record(s) selected.

To be precise, the escaping of reserved characters happens upon XML serialization to text.

Now what happens if we want to index and query such XML that has (had) CDATA sections in it? Well, let’s give it a try. First, let’s create an index and then query for “Food & Wine” with the “&” in it:

create index idx1 on cdata(doc) generate keys using xmlpattern '/doc/name' as sql varchar(35);

select id
 from cdata
 where xmlexists('$DOC/doc/name[ . = "Food & Wine"]');

SQL16002N  An XQuery expression has an unexpected token "&" following "Food ".

The query fails, as expected, because the “&” is a reserved character. We need to follow XQuery rules and escape the “&”:

select id
 from cdata
 where xmlexists('$DOC/doc/name[ . = "Food &amp; Wine"]');


4 record(s) selected.

As expected, all 4 rows are returned.If you check the execution plan you will also find that the index can be used.

Conclusions: The use of CDATA sections does not impair the queryability and indexability of XML elements in DB2. Also, it doesn’t matter whether you wrap the CDATA section just around the offending characters or around a larger piece of text in the element value.


Of course, in this simple example we wouldn’t necessarily have to use a CDATA section to escape the “&” in the input documents. We could have escaped the “&” as “&amp;” before inserting the documents. But, if an an element contains a longer piece of text that comes from an XML-agnostic source (i.e. various illegal characters may be present), then it’s much easier to make it a CDATA section than to find and escape the offending characters one by one.



You might have noticed that DB2 10 for Linux, UNIX, and Windows was announced earlier this month, and it contains a bunch of great new features. One of the new features that I am particularly excited about is Time Travel, which enables bitemporal data managenment in DB2.

DB2 10 also includes a variety of useful XML enhancements. Let’s take a quick look at the most important ones here:

New XML Index Data Types: Integer, Decimal

As you know, when you create an index on an XML element or attribute you must specify the data type that DB2 should use to interpret and index the specifies nodes. The reason is that the usage of XML Schema, which might define data types, is optional. And even if an XML Schema was used, it might not define a data type for each and every element and attribute.

Before DB2 10, an XML index for numeric values is specified with the data type DOUBLE, so that any type of numeric values can be indexed. In DB2 10 you can also specify the data types INTEGER and DECIMAL to define a numeric XML index.

If you know that a certain element or attribute can only contain INTEGER (or DECIMAL) values, then you can improve query performance by creating the XML index with the most appropriate type. For example, assume that dept is a table that has an XML column deptdoc. Then you can define the following indexes in DB2 10:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/@id’ as sql integer;

create index idx2 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/salary’ as sql decimal(6,2);

You don’t need to change existing queries to use these new indexes!

XML Indexes for Case-Insensitive Search

DB2 10 allows you to define case-insensitive indexes and search very efficiently for string values regardless of their upper case or lower case spelling. For example, to enable case-insensitive search of employee names, you could create the following index:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/name/fn:upper-case(.)‘ as sql varchar(40);

This index stores all key values in upper-case. To find employees with names such as Peter Pan, PETER PAN,  Peter PAN, pEtEr pAn, etc. you could write the following query that can use the index above:

FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[name/fn:upper-case(.) = “PETER PAN”]’);

Index support for Prefix Search

Previously, the following query with the fn:starts-with() function could not use an XML index to find all employees whose name starts with “Jo”:

FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:starts-with(name, “Jo”)]’);

In DB2 10, this query can benefit from existing XML indexes of type VARCHAR(n) that may exist on the <name> element.

XML Indexes for Existential (Structural) Predicates

An existential predicate, sometimes called structural predicate, is a predicate that checks the existence of an XML element or attribute in an XML document, regardless of its value.

For example, imagine you want to find all documents that contain a <comment> element for an employee. One of the following queries comes to mind:

FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[comment]‘);

FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:exists(comment)]‘);

To support such queries, DB2 10 allows you to create an XML index with the fn:exists function, like this:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/fn:exists(comment)‘ as sql varchar(1);

This index must be defined as VARCHAR(1) and each index key represents true or false to indicate the existence of the <comment> element in a given document.

Friendly XML query behavior

DB2 10 also includes several enhancements that make XML queries behave in a more friendly manner. Let’s look at just one of several cases. For example, consider the following query that has a numeric predicate on the @id attribute:

select *
from dept
where xmlexists(‘$DEPTDOC/dept[@id = 15]’);

What happens if this query encounters an XML document where the @id attribute has a non-numeric string value such as “WW Sales”?  In DB2 9.x this situation would cause a type error at run-time and the query would fail and abort with an error message. Although this behavior was compliant with the language standard, it is not very useful. If you look for the department with id = 15, any department with non-numeric id can never be a match. Therefore, DB2 10 simply evaluates the predicate to FALSE in this case, i.e. ignores the document with the non-numeric id, and continues processing the query.

Binary XML client/server communication

With DB2 10, Java applications can choose the format in which XML is transmitted between the application and the DB2 server. You can choose between traditional textual XML and a new binary XML format that is a compressed on-the-wire format for XML.

On the application side, the encoding and decoding of XML to and from the binary format is performed by the JDBC driver (JCC 4.9 or higher). The JDBC driver can convert the binary XML format to and from a text,  DOM, SAX, or StAX representation. The use of binary XML provides the most notable performance gain for Java applications that already consume or produce XML in SAX format.

Applications can exchange XML with the DB2 server (through inserts, queries, etc.) and completely avoid the serialization of XML to text, i.e. neither the client nor the server needs to deal with textual XML for the purpose of insert or retrieving XML. Eliminating the parsing and serialization of textual XML can enable significantly better end-to-end performance.

Other IBM products that support this binary XML communication format include DB2 10 for z/OS, the WebSphere Application Server XML Feature Pack, and the WebSphere Datapower appliance.

Query Performance Enhancements

DB2 10 also includes a number of performance improvements for common types of XML queries, including many types of queries that use the XMLTABLE function.

What is an XML Schema? Some of you may already know this, others don’t. So before I’m going to share some more technical information about XML Schemas in subsequent blog posts, I better get some of the basics out of the way first.

When you process and manage information in XML format, you can choose to use an XML Schema with your XML documents. Roughly speaking, an XML Schema can be used to define what you want your XML documents to look like. For example, in an XML Schema you can define:

  • Which elements and attributes are allowed to occur in your XML documents
  • How the elements can be or must be nested, or the order in which the elements must appear
  • Which elements or attributes are mandatory vs. optional
  • The number of times a given element can be repeated within a document (e.g. to allow for multiple phone numbers per customer, multiple items per order, etc.)
  • The data types of the element and attribute values, such as xs:integer, xs:decimal, xs:string, etc.
  • The namespaces that the elements belong to
  • …and so on.

If you choose to create an XML Schema, it may define just some or all of the aspects listed above. The designer of the XML Schema can choose the degree to which the schema constraints the characteristics of the XML documents. For example, an XML Schema can be very loose and define only a few key features for your XML documents and allow for a lot of flexibility. Or it can be very strict to tightly control the XML data in every aspect. Or anything in between.

The use of an XML Schema is optional, i.e. an XML Schema is not required to store, index, query, or update XML documents. However, an XML Schema can be very useful to ensure that the XML documents that you receive or  produce are compliant with certain structural rules that allow applications to process the XML. In other words, XML Schemas help you to enforce data quality.


If an document complies with a given XML Schema, then the document is said to be valid for this schema. A document might be valid for one schema but invalid for another schema. The process of testing an XML document for compliance with an XML Schema is called validation.

When an XML document is parsed by an XML parser, validation can be enabled as an optional part of the parsing process. Full validation of an XML document always requires XML parsing. For many documents and schemas, validation typically incurs only a small delta cost (in terms of CPU usage) on top of the cost of XML parsing.

What does an an XML Schema look like?

An XML Schema itself is an XML document! But, a very special document that needs to comply with very specific rules that are defined by -you guessed it!- another XML Schema, i.e. the schema for schemas.

Large XML schemas can consist of multiple schema documents that reference each other through import and include relationships. This allows you to compose an XML Schema out of smaller building blocks in a modular fashion.

I don’t want to go into the syntax details of XML Schemas here, but there are some useful resources available:

When and why should I use an XML Schema?

Simply put, if you want to ensure data quality and detect XML documents that do not comply with an expected format, use an XML Schema and validate each document!

However, what if XML documents pass through multiple components of your IT infrastructure, such as a message queue, an application server, an enterprise service bus, and the database system? If these components do not modify the XML but merely read & route it, examine whether all of these components need to validate each document. For example, if the application server has already validated a document before insertion into a DB2 database, does the document need to be validated again in DB2? Maybe not, if you trust the application layer. Maybe yes, if you don’t.

An XML Schema is also often used as a “contract” between two or more parties that exchange XML documents. With this contract the parties agree on a specific format and structure of the XML messages that they send and receive, to ensure seamless operation.

Practically every vertical industry has defined XML Schemas to standardize XML message formats for the data processing in their industry. A good overview is given by the introduction of this article:

“Getting started with Industry Formats and Services with pureXML”:

How can I validate XML documents in DB2?

Simple. First, you register one or multiple XML Schemas in the DB2 database. This can be done with CLP commands, stored procedures, or through API calls in the JDBC or .NET interface to DB2. After a schema is registered in DB2, you can use it to validate XML documents in DB2, typically when you insert, load, or update XML documents. You can enforce a single XML Schema for all XML documents in an XML column, or you can allow multiple XML Schemas per column. A database administrator can force automatic validation upon document insert, or allow applications to choose one of the previously registered schema for validation whenever a document inserted.

And… validation can also be done in SQL statements?

Yup. The SQL standard defines a function called XMLVALIDATE, which can be used for document validation in INSERT statement, UPDATE statements, triggers, stored procedures, and even in queries.

Here is a simple example of an INSERT statement that adds a row to a customer customer table, which consists of an integer ID column and an XML column called “doc”:

INSERT INTO customer(id, doc)

The id and the document are provided by parameter markers “?”, and the XMLVALIDATE function that is wrapped around the second parameter ensures validation against the XML Schema that has been regoistered under the identifier db2admin.custxsd.

If the inserted document is not compliant with the XML Schema, the INSERT statement fails with an appropriate error message. Similarly, the XMLVALIDATE function can also be used in the right-hand side of the SET clause of an UPDATE statement that modifies or replaces an XML document.

Ok, so much for now. In my next blog post we’ll go into more detail.

A common question is how to obtain a list of all the elements and attributes that occur in an XML document. Producing such a list is what I call “XML profiling” and in a previous blog post I have discussed several SQL/XML queries that can do this.

An extension of this question is how to get the paths of all the elements and attributes in a document. This seemingly simple task is -unfortunately- not nearly as simple as one would think! XPath and XQuery do not have a function that takes a given element or attribute as input and returns the full path to that node.

The solution is to write a query that traverses the XML document level by level to collect the element names at every level and concatenate them appropriately to construct the paths for every elements and attributes at every level.

There are many ways in which this can be done. You can use XQuery or SQL/XML and you can choose whether to use recursion or not. Let’s look at a few examples.

First, let’s create a simple table with a small document that we can use in the examples:

create table mytable(xmldoc XML);

insert into mytable values(
   <Person id ="123">

A first and straightforward solution is to start at the root of the document, then at the first level of child nodes, and then at the children of each these child nodes, and so on. For each element or attribute we construct the path by concatenating the path from the parent with the name of the element or attribute. We do this for all nodes at a given level in the tree and then move to the next level of the document.:

for $L1 in db2-fn:xmlcolumn("MYTABLE.XMLDOC")/*
let $L1path := fn:string-join( ($L1/local-name() ),"/" )
return (
  for $L2 in $L1/(*,@*)
  let $L2path := fn:string-join( ($L1path, $L2/local-name() ),"/" )
  return (
    for $L3 in $L2/(*,@*)
    let $L3path := fn:string-join( ($L2path, $L3/local-name() ),"/" )
    return (

    for $L4 in $L3/(*,@*)
    let $L4path := fn:string-join( ($L3path, $L4/local-name() ),"/" )
    return (

    for $L5 in $L4/(*,@*)
    let $L5path := fn:string-join( ($L4path, $L5/local-name() ),"/" )
    return ($L5path)))));


6 record(s) selected.

The obvious shortcoming of this query is that it assumes a maximum of 5 levels in the document. If your documents are deeper than this, you can easily extend the query so that it goes down to 10 or 20 levels, whatever you need. That’s maybe not very elegant, but it works if you can define an upper bound on the depths of your XML documents, which is usually possible.

You probably notice that the path Message/Person/id should actually be Message/Person/@id because “id” is an XML attribute. The query can enhanced to take care of such details. In the last two sample queries of my XML profiling post you have seen how to use the self::attribute() test for this purpose.

If you prefer a more elegant solution that does not require any assumption about the maximum depths of the XML documents, then you need to code a recursive query, either in XQuery or in SQL/XML. Let’s try SQL/XML for a change.

You may already be familiar with how recursive SQL works. If not, you can look at several existing examples. The basic idea is to use a WITH clause, also called “common table expression”, that contains a UNION ALL between the start of the processing and a recursive reference back to the common table expression itself. The following  augments this approach with the XMLTABLE function that extracts nodes and node names from the XML:

WITH pathstable (name, node, xpath) AS (
  SELECT AS name, x.node AS xmlnode,'/' || AS xpath
  FROM mytable,
          name    varchar(30) PATH './local-name()',
          node    XML         PATH '.') AS x
  SELECT AS name, y.node AS xmlnode, xpath|| '/' || AS xpath
  FROM pathstable,
       XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
         name    varchar(30) PATH 'local-name()',
         node    XML         PATH '.') AS y
) SELECT name, xpath
  FROM pathstable;

NAME                           XPATH
------------------------------ -------------------------------
Message                        /Message
Type                           /Message/Type
Person                         /Message/Person
id                             /Message/Person/id
FirstName                      /Message/Person/FirstName
LastName                       /Message/Person/LastName

6 record(s) selected

If you want to list the element and attribute values for each path, then you can easily modify this query as follows:

WITH pathstable (name, node, xpath, value) AS (
  SELECT AS name, x.node AS xmlnode,
         '/' || AS xpath, x.value as value
  FROM mytable,
          name    varchar(30) PATH './local-name()',
          value   varchar(20) PATH 'xs:string(.)',
          node    XML         PATH '.') AS x
  SELECT AS name, y.node AS xmlnode,
         xpath|| '/' || AS xpath, y.value as value
  FROM pathstable,
       XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
          name    varchar(30) PATH 'local-name()',
          value   varchar(20) PATH 'xs:string(.)',
          node    XML         PATH '.') AS y
) SELECT xpath, value
  FROM pathstable;

XPATH                           VALUE
------------------------------- --------------------
/Message                        UrgentRobertTester
/Message/Type                   Urgent
/Message/Person                 RobertTester
/Message/Person/id              123
/Message/Person/FirstName       Robert
/Message/Person/LastName        Tester

6 record(s) selected

A few things to note:

  • The value of an XML element is defined at the concatenation of all text nodes in the subtree under that element. This explains the values that you see for /Message and /Message/Person in the example above.
  • For longer paths you may need to increase the length of the VARCHAR(n) in the XMLTABLE function.
  • In DB2 you may receive warning SQL0347W, which says that this query might recursively run into an infinite loop. But, this would only happen if your XML document was infinitely deep, which isn’t possible. So, you can safely ignore that warning.


Part 2 of our article “Data normalization reconsidered” is now available at

The second part discusses alternatives to a traditional normalized relational representation of data. Such alternatives include for example XML, JSON, and RDF because they can often help you overcome normalization issues or improve schema flexibility, or both. In the 21st century, digitized business records are often created in XML to begin with, which makes XML an attractive choice as the database level storage format.

This article also contains a performance comparison between XML and relational data that was conducted for a real-world application scenario at a major international logistics company.

At the end of the article you find comparison tables that summarize the pros and cons of different data representations.