The beauty of a “hybrid” database system is that it allows you to manage both XML and relational data at the same time, even side-by-side in the same table if you want. You can choose which data to store in XML and which in relational format. Best practices for determining the “right” mix of XML and relational in a database schema is an interesting topic, but I’ll reserve that for a separate blog post in the near future.

Today, let’s look at how you can insert information from relational columns into an XML document. Let’s start with a simple table and a couple of rows. Each row holds an XML document in an XML column as well as a document_id and the date when the document was created:

CREATE TABLE documents(document_id INTEGER, created DATE, document XML);

INSERT INTO documents
VALUES(1, '2011-05-20',
'<mydoc>
   <header>
     <title>How to skin a cat</title>
     <author id="17">John Doe</author>
   </header>
   <body>
     <info>There are many ways...</info>
   </body>
 </mydoc>');

INSERT INTO documents
VALUES(2, '2011-05-21',
'<mydoc>
   <header>
     <title>Introduction to XML</title>
     <author id="23">Peter Pan</author>
     <author id="49">Susie Swing</author>
   </header>
   <body>
     <info>What is XML?</info>
   </body>
 </mydoc>');

Now let’s assume we want to retrieve these documents and include the “created” date from the relational column as an XML element <created> within the <header> element. This is not hard. You can simply include an XQuery update expression in your query that retrieves the documents:

SELECT XMLQUERY('copy $new := $doc
                 modify do insert <created>{$crdate}</created>
                           into $new/mydoc/header
                 return $new'
                 PASSING documents.document as "doc",
                         documents.created  as "crdate")
FROM documents;

The XMLQUERY function contains the XQuery update expression (in single quotes) and the PASSING clause that passes column information to the XQuery expression. Here we are passing the XML columns “document” and the DATE column “created”, so that they can be referenced as XQuery variables $doc and $crdate.

DB2 for Linux, UNIX, and Windows also allows an abbreviated syntax, i.e. you can omit the PASSING clause and refeence the column names directly (in upper case), as in the following equiavalent query:

SELECT XMLQUERY('copy $new := $DOCUMENT
                 modify do insert <created>{$CREATED}</created>
                           into $new/mydoc/header
                 return $new' )
FROM documents;

The XQuery update expression assigns the XML from the XML column ($DOCUMENT) to the variable $new and then it modifies and returns $new. The “modify” clause inserts the new XML element <created> into the /mydoc/header element, using the value from the relational DATE column that is referenced here as $CREATED.

As a result, both of the queries above return the following output, with the desired <created> elements:

<mydoc>
  <header>
    <title>How to skin a cat</title>
    <author id="17">John Doe</author>
    <created>2011-05-20</created>
  </header>
  <body>
    <info>There are many ways...</info>
  </body>
</mydoc>

<mydoc>
  <header>
    <title>Introduction to XML</title>
    <author id="23">Peter Pan</author>
    <author id="49">Susie Swing</author>
    <created>2011-05-21</created>
  </header>
  <body>
    <info>What is XML?</info>
  </body>
</mydoc>

It’s that simple!

Note that these queries do not change the documents that are stored in the “documents” column. They only modify the documents on the fly while reading them out. If you want to permanently change the documents, you would use the exact same XQuery update expression in an SQL UPDATE statement rather than a SELECT statement, like this:

UPDATE documents
SET document =  XMLQUERY('copy $new := $DOCUMENT
                          modify do insert <created>{$CREATED}</created>
                                    into $new/mydoc/header
                          return $new' );

But, let’s stick with queries for now and extend the XQuery update expression to also merge the relational document_id into the XML. Specifically, assume we need to include the document_id as an attribute “docid” in the “mydoc” element. We can simply add another “insert” operation in the modify clause, which is easy because the modify clause can have a comma-separated list of operations, including insert, replace, and delete operations.

To add an attribute you have to specify “insert attribute” followed by the desired attribute name (docid) and the attribute value in curly brackets. The curly brackets can either contain a literal value in double quotes or -as in our case here- a reference, such as our reference to the relational DOCUMENT_ID column:

SELECT XMLQUERY('copy $new := $DOCUMENT
                 modify (do insert <created>{$CREATED}</created>
                            into $new/mydoc/header,
                         do insert attribute docid {$DOCUMENT_ID}
                            into $new/mydoc )
                 return $new' )
FROM documents;

And the result is as desired:

<mydoc docid="1">
  <header>
    <title>How to skin a cat</title>
    <author id="17">John Doe</author>
    <created>2011-05-20</created>
  </header>
  <body>
    <info>There are many ways...</info>
  </body>
</mydoc>

<mydoc docid="2">
  <header>
    <title>Introduction to XML</title>
    <author id="23">Peter Pan</author>
    <author id="49">Susie Swing</author>
    <created>2011-05-21</created>
  </header>
  <body>
    <info>What is XML?</info>
  </body>
</mydoc>

Going one step further, you might wonder how to insert data from relational columns in *other* tables into these XML documents. Well, you can do a join so that the desired values from the other table line up in the same row as your XML document, and then apply the same techniques as above.

It get’s slightly more interesting if you want add information from *multiple* rows into a single document, based on a one-to-many relationship. I can show you how to do this, but I’m about to board a flight to go  vacation, so you’ll find a couple of examples in my next blog post when I’m back, or many more examples in the DB2 pureXML Cookbook!

Advertisements

Intel has recently launched their new “Westmere-EX” CPUs that are now part of the Intel Xeon E7 processor family. For 5 years now Intel has a strong history of testing and showcasing many of their latest processors with the Transaction Processing over XML (TPoX) benchmark. So it comes as no big surprise that Intel has now also released TPoX benchmark results with their latest 10-core Xeon E7-4870 processor.

This benchmark uses TPoX Version 2.0 with 1TB of raw XML data, running on DB2 9.7 under SUSE Linux 10. DB2 compression reduces the 1TB of raw data to 537GB, which includes all required XML indexes.

The system under test is an Intel Xeon E7-4870 processor server with 4 CPUs, 40 cores, 80 threads, and a clock speed of 2.4GHz. Further details on the CPUs can be found here. The storage configuration consists of 54 Intel X-25E Solid State Drives (SSDs), 48 SSDs for the database tables and indexes plus 6 SSDs for the DB2 transaction log.

To showcase the maximum possible TPoX throughput on this hardware, Intel ran the TPoX workload driver with 440 concurrent connections (“users”) all of which issue a random stream of read and write transactions without think time. The result is a whopping 17,757 TTPS (TPoX Transactions per Second)!

This result constitutes 29% higher performance than on the previous generation CPU, the 8-core Nehalem-EX, which achieved 13,745 TTPS. Details on these TPoX benchmark results are available on the TPoX results page where you find a history of all TPoX results as well as examples of other TPoX usage.

The great thing about these benchmarks is that they continuously push hardware, operating system, and database to the edge, relentlessly exposing any inefficiencies that may exist and giving us the chance to continuously improve the technology. This latest TPoX benchmark is also one of the first TPoX tests that runs entirely on solid state drives!