A short summary of why and how the New York State Department of Taxation & Finance is using DB2 pureXML is available here:

http://smarterquestions.org/2011/06/a-true-story-about-using-xml-forms-and-db2-to-process-millions-of-tax-returns/

Many of the reasons why XML is useful for forms processing also apply to other forms-based application, e.g. in healthcare, insurance, government, and other industries. Most cases of forms-processing share the following characteristics:

  • Many different types of forms are being used, each one having a different set of fields. (This is schema diversity!)
  • Some forms tend to change over time, usually to support new or changed business processes or regulatory requirements. Simply put, fields on the forms get changed, added, or removed. (This is schema evolution!)
  • When information is filled into any given form, there are typically many optional fields that can -and often will- remain blank. (This is sparsely populated data!)

As it turns out, XML is very well suited to handle schema diversity, schema evolution, and sparse data. Optional XML elements and attributes that may or may not appear in a given document are convenient to handle sparse data, whereas in a relational model sparse data often leads to large numbers of NULL values.

XML also supports schema flexibility, e.g. documents for different schemas (or different versions of the same schema) can be stored, indexed, and queried in a single XML column.

And finally, if most of the application operations touch one form at a time, e.g. insert a form, retrieve a form, update a form, validate a form, and so on, then an XML solution where each form is a single XML object typically also provides significant performance benefits over a normalized relational schema for the same information.

 

 

In DB2 for Linux, UNIX, and Windows, the quickest and easiest way to produce XML from relational tables is to use the function XMLROW or XMLGROUP. The following examples illustrates how this works.

Let’s first define a relational table and insert a couple of rows:

create table employee(id int, name varchar(30), dept char(3));

insert into employee values (1, ‘John’, ‘PX8’);
insert into employee values (2, ‘Mary’, ‘KH2’);

select id, name, dept  from employees;

ID             NAME                          DEPT
———– —————————— ————————
          1     John                               PX8
          2    Mary                               KH2

  2 record(s) selected.

Now, let’s select the same columns with the XMLROW function:

select xmlrow(id, name, dept) from employee;

<row><ID>1</ID><NAME>John</NAME><DEPT>PX8</DEPT></row>

<row><ID>2</ID><NAME>Mary</NAME><DEPT>KH2</DEPT></row>

  2 record(s) selected.

You can see that each row is returned as an XML document, using column names as element names. And you can further customize the XML if you like. For example, you can rename the root element for each row:

select xmlrow(id, name, dept OPTION ROW “EMPL”) from employee;

<EMPL><ID>1</ID><NAME>John</NAME><DEPT>PX8</DEPT></EMPL>

<EMPL><ID>2</ID><NAME>Mary</NAME><DEPT>KH2</DEPT></EMPL>

  2 record(s) selected.

You can also rename the elements for individual columns, if you want:

select xmlrow(id AS “EMP_NO”, name, dept OPTION ROW “EMPL”) from employee;

<EMPL><EMP_NO>1</EMP_NO><NAME>John</NAME><DEPT>PX8</DEPT></EMPL>

<EMPL><EMP_NO>2</EMP_NO><NAME>Mary</NAME><DEPT>KH2</DEPT></EMPL>

  2 record(s) selected.

And if you prefer to combine all selected rows into a single XML document, you would use XMLGROUP instead of XMLROW:

select xmlgroup(id, name, dept OPTION ROW “EMPL”) from employee;

<rowset>
  <EMPL><ID>1</ID><NAME>John</NAME><DEPT>PX8</DEPT></EMPL>
  <EMPL><ID>2</ID><NAME>Mary</NAME><DEPT>KH2</DEPT></EMPL>
</rowset>

 1 record(s) selected.

For more options and information on these functions, see:

XMLROW:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0050587.html

XMLGROUP:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0050588.html

SQL/XML publishing functions for constructing XML values:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.xml.doc/doc/c0023909.html

In my previous blog post I promised that I would provide some examples of how to insert information from multiple relational tables and multiple rows into an XML document.

So, let’s continue the example from my previous post where we had a table with relational INTEGER and DATE columns as well as an XML column. For simplicity, we now use only one XML document in this XML column, but it could be many more:

DROP TABLE documents;
CREATE TABLE documents(document_id INTEGER, created DATE, document XML);

INSERT INTO documentsVALUES(2, '2011-05-21',
'<mydoc>
   <header>
     <title>Introduction to XML</title>
     <author id="23">Peter Pan</author>
     <author id="49">Susie Swing</author>
   </header>
   <body>
     <info>What is XML?</info>
   </body>
 </mydoc>');

Let’s add a second table “document_reviews” that contains a rating and review for each document. Documents and reviews are related to each other and can be joined based on the document_id column:


CREATE TABLE document_reviews(document_id INTEGER,
                              rating CHAR(2),
                              review VARCHAR(32000) );

INSERT INTO document_reviews VALUES(2, 'A+', 'This books is...');

Now we want to to read the XML document from the “documents” table and insert the corresponding rating information from the “document_reviews” table into the XML. This task is quite easy, using the techniques described in my previous post and simply using a join between the two tables:

SELECT XMLQUERY('copy $new := $DOCUMENT
                 modify do insert <rating>{$RATING}</rating>
                           into $new/mydoc/header
                 return $new' )
FROM documents, document_reviews
WHERE documents.document_id = document_reviews.document_id;

Note that the XQuery variable $RATING refers to the “rating” column in the document_reviews table. And the result of this query is the original document with the desired <rating> element inserted into the <header>:

<mydoc>
  <header>
    <title>Introduction to XML</title>
    <author id="23">Peter Pan</author>
    <author id="49">Susie Swing</author>
    <rating>A+</rating>
  </header>
  <body>
    <info>What is XML?</info>
  </body>
</mydoc>

1 record(s) selected.

Now it gets more interesting. What would the query above return if there were *multiple* review rows for this one document? Well, let’s find out! Let’s insert a second review for the same document_id and run the query again.

INSERT INTO document_reviews VALUES(2, ‘C’, ‘I did not like…’);

After inserting this additional review, our query above returns the XML document twice, once for each of the two reviews:

<mydoc>
  <header>
    <title>Introduction to XML</title>
    <author id="23">Peter Pan</author>
    <author id="49">Susie Swing</author>
    <rating>A+</rating>
  </header>
  <body>
    <info>What is XML?</info>
  </body>
</mydoc>

<mydoc>
  <header>
    <title>Introduction to XML</title>
    <author id="23">Peter Pan</author>
    <author id="49">Susie Swing</author>
    <rating>C</rating>
  </header>
  <body>
    <info>What is XML?</info>
  </body>
</mydoc>

2 record(s) selected.

The first document in the result set contains a <rating> element with the rating from the first review, and the second document contains a <rating> element according to the second review. The original XML document is duplicated for each rating, due to the nature of the relational join between the two tables.

Although this form of output might be useful in some cases, chances are that you would like all ratings for this document to be included in a single copy of the document. This can be achieved if we convert the join into a correlated subquery that produces a sequence of <rating> elements that are all inserted into a single copy of the document:

SELECT XMLQUERY('copy $new := $DOCUMENT
                 modify do insert db2-fn:sqlquery("
                             SELECT XMLELEMENT(NAME ""rating"", rating)
                             FROM document_reviews
                             WHERE document_id = parameter(1)" ,
                             $docid            )
                           into $new/mydoc/header
                 return $new'
                 PASSING documents.document_id as "docid" )
FROM documents;

In this query, the modify clause contains the function db2-fn:sqlquery that takes an SQL query and optional parameters as input. This SQL query reads the rating column from the document_reviews tables and constructs a sequence of <rating> elements.

The WHERE clause of this subquery is parameterized, using $docid as the parameter value (line 5). The $docid is passed into the XQuery context from the documents.document_id column (line 9), and ensures that the subquery only reads ratings for correct document. As a result, a single document with two <rating> elements is returned:

<mydoc>
  <header>
    <title>Introduction to XML</title>
    <author id="23">Peter Pan</author>
    <author id="49">Susie Swing</author>
    <rating>A+</rating>
    <rating>C</rating>
  </header>
  <body>
    <info>What is XML?</info>
  </body>
</mydoc>

1 record(s) selected.

I think these example provided a good idea of the powerful capabilities that exist to combine XML and relational data.