In times of data leakage, hacker attacks, trojan horses, and various other data security threats, efficient and effective data encryption has becomes a critical requirement for many enterprises. This need applies to XML as much as to most other forms of data.

Intel and IBM have collaborated to demonstrate that a combination of modern hardware and software can perform effective encryption with very high efficiency.

In particular, the latest Intel Xeon E5 and E7 processor families provide AES-NI, which is a set of new instructions in the Intel Xeon processor that help accelerate encryption, decryption, key generation, matrix manipulation, and carry-less multiplication.

A joint benchmark has measured the performance impact of encrypting and decrypting XML data in DB2 9.7 using the IBM InfoSphere Guardium Data Encryption capabilities on the Intel Xeon E5 platform.

The results of the TPoX benchmark show that full encryption can be performed for a read/write XML transaction processing application with less than 4% overhead. This is a fantastic result.

More information on these tests and on the technologies and products used, is available here:

http://software.intel.com/en-us/articles/overcoming-performance-obstacles-in-data-encryption

http://software.intel.com/sites/default/files/m/d/4/1/d/8/IBM_DMM_Intel_AES_Vormetric_FINAL.pdf

http://ibmdatamag.com/2012/05/overcoming-performance-obstacles-in-data-encryption/

Advertisements

XML from Vegas to Berlin!

October 21, 2012

I’m in Las Vegas right now, this glittering and crazy city in Nevada! Las Vegas is once again the location for IBM’s annual Information On Demand (IOD) conference, which is starting today!

IOD is the premier IBM event for Information Management software, including DB2, Informix, Netezza, Cognos, Big Data, SPSS, Content Maagement, and other product areas. The conference program is very diverse and has something for everyone. At IOD you can choose from more than 1500 sessions and 100+ hands-on labs.

Not surprisingly, XML continues to be an important topic for information management, and there are various XML-related sessions at IOD. In particular, I’m looking forward to the following sessions where some of IBM’s customers are sharing their success stories with DB2 pureXML:

Session 3674A:
IBM pureXML in Financial Applications: Experiences From Vanguard
Mon, Oct 22, 201, 11:30 AM – 12:30 PM
Speakers: George White and Milton Beaver, The Vanguard Group

Sessio 1284B:
IBM DB2 and XML: Excellent Opportunity or Extra Problems? Are You Ready?
Mon, Oct 22, 2012, 2:15 PM – 3:15 PM
Speaker: Kurt Struyf, suadasoft

Session 1613A:
How to Get Warehouse-Type Performance With XML Tables
Wed, Oct 24, 2012, 2:30 PM – 3:30 PM
Speaker: Ray Sippel, BJC HealthCare

Additionally, you can get hands-on experience with XML on DB2 for Linux, UNIX, and Windows and DB2 for z/OS in the following hands-on labs:

Lab 1255A
Beyond SQL With IBM DB2 10: Maintaining and Querying XML Data and RDF Stores
Thu, Oct 25, 2012, 1:30 PM – 4:30 PM

Lab 1314A
New Features in IBM DB2 10 for z/OS Improve Development Productivity (including XML features)
Mon, Oct 22, 2012, 10:00 AM – 1:00 PM

If you can’t be in Las Vegas this week (maybe because you’re on the other side of the globe?), how would Berlin work for you?!

The annual European conference of the International DB2 User Group (IDUG) is coming up in Berlin, Germany, on November 4 to 9, 2012. For a number of years there hasn’t been an IDUG conference without any XML sessions, and the same is true this year. Here are some presentations that can you see in Berlin:

An XML Document’s Life – Dr. Node!
Speaker: Terri Grissom, BMC Software
Wed, Nov 07, 2012, 11:00 AM – 12:00 PM

“Breaking the Relational Limit with pureXML in DB2 for z/OS”
Speaker: Mengchu Cai, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 01:00 PM – 02:00 PM

“How to Design a Hybrid XML/Relational Database Schema
Speaker: Matthias Nicola, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 02:15 PM – 03:15 PM

I hope to see you either in Vegas or in Berlin! Enjoy your conference!

In DB2, validation of XML documents against XML Schemas is optional. If you choose to validate XML documents in DB2, the most typical scenario is to validate XML documents when they are inserted or loaded into the database. This makes sense: if you ensure that the XML that enters the database is valid, then subsequent queries can assume that the data is valid and complies with a particular XML Schema.

Likewise, validation in XML updates statements ensures that document replacement or document modifications do not violate your XML Schema.

Here is a simple example for document validation in INSERT and UPDATE statements, based on an XML Schema that was registered in the DB2 XML Schema Repository (XSR) under the SQL name db2admin.myschema:

CREATE TABLE mytable (id INTEGER NOT NULL PRIMARY KEY, doc XML);

INSERT INTO mytable
VALUES(?, XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema));

UPDATE mytable
SET doc = XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;

UPDATE mytable
SET doc = XMLVALIDATE( XMLQUERY(‘copy $new := $DOC
                                 modify do insert <status>delivered</activated>
                                           into $new/message/header
                                 return $new’)
             ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;

There are also cases when you might want to validate XML as part of a query. There can be several reasons for that:

  •  Documents were inserted or updated without validation and you need to validate them before consumptions.
  •  You wish to validate XML documents against a different schema than the one was used for validation upon insert or update.
  •  You are extracting fragments of stored XML documents and wish to validate them against a specific schema.
  •  Your queries are constructing entirely new XML documents and you wish to vaidate that the constructed XML complies with a given schema.

Regardless of the motivation, XML validation in a query is simple.

You can simply use the XMLVALIDATE function in a SELECT statement. All the same options for XMLVALIDATE are allowed as if you would use it in an INSERT or UPDATE statement. Let’s look at several examples:

SELECT XMLVALIDATE(doc ACCORDING TO XMLSCHEMA ID db2admin.myschema)
FROM mytable
WHERE id = 5;

This query above reads a specific document and performs schema validation against the XML Schema that was registered as db2admin.myschema.
If the selected document is valid for the specified schema, the document is returned.
If the selected document is not valid for the specified schema, the query fails and produces an error code that points to why the schema is violated.

Instead of the XML column name doc, the XMLVALIDATE function can take any argument of type XML, such the result of an XMLQUERY function. The following query uses the XMLQUERY function to extract just the message body from an XML document and validates it against the XML Schema db2admin.msgbodyXSD:

SELECT XMLVALIDATE( XMLQUERY(‘$DOC/message/body’)
            ACCORDING TO XMLSCHEMA ID db2admin.msgbodyXSD )
FROM mytable
WHERE id = 5;

The next query constructs a new XML document and validates it as part of the query:

SELECT XMLVALIDATE(        
         XMLQUERY(‘document{
                    <newdocument>
                      <header>{$DOC/party/identity}</header>
                      <body>
                          {$DOC/party/name}
                          {$DOC/party/details/address}
                      </body>
                   </newdocument>}’)
       ACCORDING TO XMLSCHEMA ID db2admin.newdocXSD)
FROM mytable
WHERE id = 5;

These examples give you an idea of the capabilities for validating XML query results against an XML Schema.

 

Recently I received some questions about the result sets when querying XML, and especially when querying repeating elements that occur more than once per document.

As it turns out, the same logical result can be returned in different ways, depending on how you write your XQuery or SQL/XML query.

Let’s look at a simple table with two XML documents, and then at several different queries against that data. Here is the sample data:


create table testtable(doc XML);

insert into testtable(doc)
 values ('<a id="1">
             <b>1</b>
             <b>2</b>
          </a>');

insert into testtable(doc)
 values ('<a id="2">
             <b>3</b>
             <b>4</b>
             <b>5</b>
         </a>');

Now assume we want to return all the <b> elements from these two documents. You can write such a query in several different ways, each returning the same <b> elements in a slightly different way:

  1. XQuery FLWOR expression
  2. XQuery FLWOR expression within an SQL VALUES clause
  3. SQL/XML query with the XMLQUERY function
  4. SQL/XML query with the XMLTABLE function

Let’s look at each of these options in turn.

1. XQuery FLWOR expression

The fist example is a simple XQuery FLWOR expression. It iterates over the path /a/b in all documents and returns the <b> elements one by one. The result is a sequence of 5 elements, and each is returned as a single item in the result set:

xquery
 for $b in db2-fn:xmlcolumn("TESTTABLE.DOC")/a/b
 return $b';

<b>1</b>
<b>2</b>
<b>3</b>
<b>4</b>
<b>5</b>

5 record(s) selected.

2. XQuery FLWOR expression in an SQL VALUES clause

If you enclose the same FLWOR expression in an SQL VALUES clause then the same XML elements are returned in a different format.

In this example, the VALUES clause produces a single value. The SQL type of that value is the XML data type and the value itself is a sequence of 5 elements. The entire sequence is returned as a single value of type XML:

values(xmlquery('
 for $b in db2-fn:xmlcolumn("TESTTABLE.DOC")/a/b
 return $b'));

<b>1</b><b>2</b><b>3</b><b>4</b><b>5</b>

1 record(s) selected.

3. SQL/XML query with the XMLQUERY function

You could also write an SQL SELECT statement and include your XQuery or XPath expression in an XMLQUERY function.

Note that the XMLQUERY function is a scalar function, i.e. it returns one result value of type XML for each row that it is applied to. Since our sample table contains two rows, the following query returns two results values of type XML. The first value is a sequence with all the <b> elements from the first document, and the second value is the sequence of all <b> elements from the second document:

SELECT xmlquery('for $b in $DOC/a/b return $b') as col1
FROM testtable;

COL1
----------------------------
<b>1</b><b>2</b>
<b>3</b><b>4</b><b>5</b>

2 record(s) selected.


-- same result with a simple XPath:
SELECT xmlquery('$DOC/a/b') as col1
FROM testtable;

COL1
----------------------------
<b>1</b><b>2</b>
<b>3</b><b>4</b><b>5</b>

2 record(s) selected.

The potential benefit of this result format is that you now exactly which <b> elements came from the same input document. If you prefer to return each <b> element as a separate item, use the XMLTABLE function.

4. SQL/XML query with the XMLTABLE function

The XMLTABLE function is not a scalar function, it’s a table function. This means that it returns a set of result rows for each input document. More precisely, it return one result row for each item that is produced buy the row-generating expression /a/b:

-- return a column of type XML:
SELECT X.*
FROM testtable,
 XMLTABLE('$DOC/a/b'
 COLUMNS
 col1 XML PATH '.') as X;

COL1
----------------------------
<b>1</b>
<b>2</b>
<b>3</b>
<b>4</b>
<b>5</b>

5 record(s) selected.

-- return a column of type integer:
SELECT X.*
FROM testtable,
 XMLTABLE('$DOC/a/b'
 COLUMNS
 col1 INTEGER PATH '.') as X;

COL1
--------
 1
 2
 3
 4
 5

5 record(s) selected.

The result sets in all of these example make sense and are consistent with SQL semantics. You can chose the shape of your query results and write your queries accordingly.

 

This is an add-on to my previous post on Using the XMLTABLE function in UPDATE and MERGE statements.

I had tested all the examples in that previous post in DB2 for Linux, UNIX, and Windows but overlooked that DB2 for z/OS currently has a restriction for the MERGE statement.

The MERGE statement in DB2 for z/OS expects a VALUES clause to provide the data that is to be merged, not an arbitrary sub-select. For details, see:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db2z9.doc.sqlref/src/tpc/db2z_sql_merge.htm

As a result, the MERGE example at the end of my previous post needs to be adjusted for DB2 z/OS. For example, you can use an SQL procedure to loop over the rows produced by the XMLTABLE query and feed these rows into the MERGE statement:


CREATE PROCEDURE XMLMERGE(IN P_CHANGE XML)
LANGUAGE SQL
BEGIN
  DECLARE ID INT;
  DECLARE X INTEGER;
  DECLARE Y VARCHAR(20);
  DECLARE SQLCODE INT;

  DECLARE C1 CURSOR FOR
    SELECT ID, X, Y
    FROM XMLTABLE('$DOC/root/mydata' PASSING P_CHANGE AS "DOC"
         COLUMNS
           id INTEGER     PATH '@id',
           x  INTEGER     PATH 'elem1',
           y  VARCHAR(20) PATH 'elem2') T;

  OPEN C1;
    LOOP1: LOOP
      FETCH C1 INTO ID, X, Y;
      IF SQLCODE <> 0 THEN LEAVE LOOP1; END IF;

      MERGE INTO RELTABLE   R
      USING (VALUES(ID, X, Y)) AS N(ID, X, Y)
      ON (R.ID = N.ID)
      WHEN MATCHED THEN UPDATE SET R.X = X, R.Y = Y
      WHEN NOT MATCHED THEN INSERT VALUES(ID, X, Y);
    END LOOP LOOP1;
  CLOSE C1;
END

Thanks to my colleague Guogen Zhang for providing this sample solution.

The XMLTABLE function is part of the SQL standard. It can take XML as input, use XPath or XQuery expressions to extract values or pieces from the XML, and return the result in a relational row and column format.

As discussed in a previous blog post, the XMLTABLE function is often used in queries to read XML documents from an XML column in the database and return the extracted values as a relational result set.

But, the XMLTABLE function can also be used in INSERT and UPDATE statements. This allows an application to pass an XML document as a parameter to an INSERT or UPDATE statement that extractx selected values and uses them to insert or update relational rows in a table.

Using the XMLTABLE function in INSERT statements is quite common if requirements dictate that XML needs to be shredded to relational tables rather than stored natively in an XML column.

Here is a simple example that extracts values from a small XML document and inserts these values into the table “reltable”:


CREATE TABLE reltable(id int, col2 int, col3 varchar(20));

INSERT INTO reltable
SELECT id, x, y
FROM XMLTABLE ('$doc/mydata'
       PASSING xmlparse(document '<mydata id="1">
                                     <elem1>555</elem1>
                                     <elem2>test</elem2>
                                  </mydata>') as "doc"
       COLUMNS
          id INTEGER     PATH '@id',
          x  INTEGER     PATH 'elem1',
          y  VARCHAR(20) PATH 'elem2'   );

SELECT * FROM reltable;

ID          COL2        COL3
----------- ----------- --------------------
          1         555 test

1 record(s) selected.

The PASSING clause assigns the input document to the variable “$doc” which is then the starting point for the extraction.

You don’t necessarily need to hardcode the XML input document in the INSERT statement. It can also be supplied via a parameter marker or host variable, as in this example:

INSERT INTO reltable
SELECT id, x, y
FROM XMLTABLE ('$doc/mydata'  PASSING  cast(? as XML) as "doc"
       COLUMNS
          id INTEGER     PATH '@id',
          x  INTEGER     PATH 'elem1',
          y  VARCHAR(20) PATH 'elem2'   );

Ok, this was the easy part.

Now what if we want to use the XMLTABLE function in an UPDATE statement to replace the values in an existing row in “reltable” with new values from an incoming XML document? It might not be quite as obvious how to write such an UPDATE statement, but as it turns out it’s not very hard either!

Here is an UPDATE statement that extracts the values of the XML elements “elem1” and “elem2” to update the columns “col2” and “col3” in “reltable” for the row that has id = 1.

UPDATE reltable
SET (col2, col3) = (SELECT x, y
                    FROM XMLTABLE ('$doc/mydata' PASSING
                           xmlparse(document '<mydata id="1">
                                                 <elem1>777</elem1>
                                                 <elem2>abcd</elem2>
                                              </mydata>') as "doc"
                         COLUMNS
                           x  INTEGER     PATH 'elem1',
                           y  VARCHAR(20) PATH 'elem2'   )
                    )
WHERE id = 1;

SELECT * FROM reltable;

ID          COL2        COL3
----------- ----------- --------------------
          1         777 abcd

1 record(s) selected.

The UPDATE statement above uses a simple WHERE clause to select a specific row to be updated. But, what if we don’t know which target row the incoming XML document needs to be applied to? Ideally, we want to extract the @id attribute from the input document and update whichever row matches its value.

We might be tempted to simply extract the @id attribute in the same XMLTABLE function and add a join predicate to the subselect, like this:

-- this statement is not a good idea!
UPDATE reltable r
SET (r.col2, r.col3) = (SELECT T.x, T.y
                        FROM XMLTABLE ('$doc/mydata'
                              PASSING  cast(? as XML) as "doc"
                              COLUMNS
                                id INTEGER     PATH '@id',
                                x  INTEGER     PATH 'elem1',
                                y  VARCHAR(20) PATH 'elem2' ) T
                        WHERE r.id = T.id
                       );

But, this statement would update all rows in the table (many of them with NULL values) because the WHERE clause only applies to the rows produced by the subselect, not to the rows in “reltable”.

One possible solution is to add a WHERE clause that extracts the @id attribute as needed to filter the rows in “reltable”:

-- this statement is better, but not optimal
UPDATE reltable
SET (col2, col3) = (SELECT x, y
                    FROM XMLTABLE ('$doc/mydata'
                          PASSING  cast(? as XML) as "doc"
                          COLUMNS
                            x  INTEGER     PATH 'elem1',
                            y  VARCHAR(20) PATH 'elem2' )
                    )
WHERE id = XMLCAST( XMLQUERY('$doc/mydata/@id'
                    PASSING cast(? as XML) as "doc") AS INTEGER);

Yes, this works, but it requires us to pass the XML document into the statement twice: once into the subselect in the SET clause, and once into the XMLQUERY function in the WHERE clause. That’s not very elegant and probably  not ideal for performance either.

There might be multiple ways to improve the UPDATE statement above. One nice solution is to use a MERGE statement:


MERGE INTO reltable r
USING (SELECT id, x, y
       FROM XMLTABLE ('$doc/mydata' PASSING  cast(? as XML) as "doc"
             COLUMNS
               id INTEGER     PATH '@id',
               x  INTEGER     PATH 'elem1',
               y  VARCHAR(20) PATH 'elem2'   )) t
ON r.id = t.id
WHEN MATCHED
THEN UPDATE SET r.col2 = t.x, r.col3 = t.y;

The nice thing about the MERGE statement is that it can also handle multiple cases where the XMLTABLE function produces multiple rows whose keys may or may not already exist in the target table, so that UPDATE and/or INSERT operations need to be performed. Here is an example:


MERGE INTO reltable r
USING (SELECT id, x, y
       FROM XMLTABLE ('$doc/root/mydata'
             PASSING xmlparse(document '<root>
                                         <mydata id="1">
                                          <elem1>999</elem1>
                                          <elem2>xyz</elem2>
                                         </mydata>
                                         <mydata id="3">
                                          <elem1>333</elem1>
                                          <elem2>test33</elem2>
                                         </mydata>
                                        </root>') as "doc"
             COLUMNS
               id INTEGER     PATH '@id',
               x  INTEGER     PATH 'elem1',
               y  VARCHAR(20) PATH 'elem2'   )) t
ON r.id = t.id
WHEN MATCHED     THEN UPDATE SET r.col2 = t.x, r.col3 = t.y
WHEN NOT MATCHED THEN INSERT VALUES(t.id, t.x, t.y);

SELECT * FROM reltable;

ID          COL2        COL3
----------- ----------- --------------------
          1         999 xyz
          3         333 test33

2 record(s) selected.

This blog post has provided some basic examples that might serve as a starting point to develop your own INSERT, UPDATE, and MERGE statements with the XMLTABLE function.

First, let’s revisit the concept of inlined XML storage and then discuss the pros and cons of inlining.

What is XML Inlining?

In short, inlining is an optional storage optimization in DB2 for “small” XML documents.

When you define a table with an XML column in DB2, such as CREATE TABLE mytable(id INTEGER, ….. , doc XML), the DB2 server creates three storage objects in the table space:

  • A data object (DAT), which holds the relational rows of the table
  • An index object (INX), which stores any indexes for the table
  • An XML storage object (XDA), which is the XML Storage Area and holds any XML documents

Optionally, you can assign these three objects to different table spaces but by default they all go into the same table space.

As illustrated in the following picture, the XML column in the data object doesn’t contain the actual XML document, but only references (descriptors) of where the documents can be found. The XML document trees are stored in the XDA object, and if a tree is larger than a single page then it is automatically cut into multiple regions. This way, large documents can span many pages, and that’s completely transparent to your application.

The region index -automatically defined and maintained by DB2- essentially remembers which regions belong to the same XML document for any given row in the data object. The regions index also enables very efficient access to any portion of an XML document. If only some part of a large document is required, e.g. to answer a query, DB2 does not necessarily need to bring all pages of the document into the buffer pool.

The XML documents that are small enough to fit onto a single page just have a single region, and multiple regions can be stored on the same page, if space is available.

As it turns out, there are very many applications that deal with small XML documents, often just 1KB to 20KB for most documents. The access to such small documents can be optimized by storing there document trees right in the DAT object, together with the rows that they belong to. This is called inlined XML storage and is illustrated in the following picture.

To enable inlining, define the XML column with an INLINE LENGTH that indicates that maximum size up to which you want documents to be inlined. For example:

CREATE TABLE mytable(id INTEGER, ….. , doc XML INLINE LENGTH 30000)

In this example, any XML documents whose parsed hierarchical representation is less than 30000 bytes will be inlined. Any documents that are larger than this are automatically stored in the XDA object as usual. The application sees no difference.

The specified INLINE LENGTH must be smaller than the page size of the DB2 table space minus the length of any other columns in the table.

Inlined documents are stored in the same tree format as documents that are not inlined. Just in a different location.

The Pros and Cons of Inlining

Performance measurements and plenty of experience with real XML applications have shown that inlining is almost always recommended, if a large percentage of your XML documents (for example, more than  70%) can be inlined. The benefits of inlining include the following:

  • Faster access to inlined documents – no redirection via the regions index
  • The regions index has no entries for inlined documents. If a large percentage of your documents are inlined then this reduces the space and maintenance cost associated with the regions index.
  • Better prefetching since inlined XML documents are prefetched as part of the row they belong to.

A key characteristic of inlining is that it drastically increases the row size on the data pages, and hence reduces the number of rows per data page. This can negatively impact the performance of queries that read only relational columns and do not access the XML column. For example, consider the following table and query:

CREATE TABLE myxml(c1 INT, c2 INT, c3 INT, doc XML INLINE LENGTH 30000);

SELECT SUM(c1 + c2 + c3)
FROM myxml;

This query reads the 3 integer columns from *all* rows in the table. Due to the inlined XML column, these rows are spread over a much larger number of pages than without inlining, so this query needs to fetch a lot more pages than without inlining. If you have many such queries, then inlining might not be the best choice for you.

How do I know whether a given document is inlined?

The DB2 function ADMIN_IS_INLINED can be applied to an XML column and it returns 1 if a document is inlined, and zero otherwise. This enables you to determine which and how many documents are inlined.

The DB2 function ADMIN_EST_INLINE_LENGTH can also be applied to an XML document in an XML column and returns the smallest required inline length for this document to be inlined, or -1 if the document is too large to be inlined.