How to list all elements and attributes in DB2 for z/OS
January 26, 2013
In a previous post I described how to write a query that produces a list of all elements and attributes that occur in a document or a set of documents.
The core idea of that technique was to use the XPath expression $doc//(*, @*) in the XMLTABLE function. In this expression, $doc references the XML document or XML column that is used as input. This expression also uses the so-called comma operator that combines two squences into one. These two sequences are are all elements (*) and all attributes (@*).
For this technique to work in DB2 for z/OS, you need to code the XPath expression in a slightly different way: ($doc//*, $doc//@*) , which lists all elements followed by all attributes.
It is useful to sort the result by the parent of each node in the document so that the elements and attributes that belong to the same parent appear in consecutive order in the result set.
The following listing shows two queries that produce the same result set, and the second of the two queries is the recommendad notation for DB2 for z/OS:
SELECT T.*
FROM purchaseorder p,
XMLTABLE('$doc//(*, @*)' passing p.porder as "doc"
COLUMNS
node VARCHAR(20) PATH 'name(.)',
parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000
ORDER BY parent;
SELECT T.*
FROM purchaseorder p,
XMLTABLE('($doc//*, $doc//@*)' passing p.porder as "doc"
COLUMNS
node VARCHAR(20) PATH 'name(.)',
parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000
ORDER BY parent;
And then you can ertainly extend such queries to also list the node values or an indication whether the node is an element or an attribute, as previously described here.
Similar considerations apply to queries that list all paths in a document.
nativexmldatabase.com – 2008 to 2012
December 28, 2012
This blog on XML database technology and DB2 pureXML has been active for almost 5 years now. My former colleage Conor O’Mahony ran this blog from early 2008 to December 2009, and I took over on Dec 15, 2009. We have received a lot of feedback and many of the blog posts were inspired by questions from XML database users.
I’m not sure yet how much time I will have for blogging in 2013 and beyond. In any case, I liked the idea of producing a summary page that provides a quick overview of all the topics and posts in this blog. But, how to produce such a summary most efficiently?
Well, true to the topic of this blog I decided to load this blog as a piece of XML into DB2 pureXML and run queries on it!
First I downloaded the complete blog as an HTML document. But, HTML is not necessarily wellformed XML because HTML allows elements with missing end tags and other goofy things. Luckily there are free tools available that convert HTML into XHTML. XHTML is a form of HTML that complies with the rules for well-formed XML documents.
After converting the entire blog into an XML (XHTML) document, it took only three steps to create a list of blog post titles, dates, and URLs:
CREATE TABLE nativexmldatabase(doc XML); LOAD FROM blog.del OF DEL REPLACE INTO nativexmldatabase(doc); SELECT date, title, url FROM nativexmldatabase, XMLTABLE('$DOC//div[@class="entrytitle"]' COLUMNS seq FOR ORDINALITY, date VARCHAR(20) PATH 'h3', title VARCHAR(105) PATH 'h2/a', url VARCHAR(145) PATH 'h2/a/@href') AS T;
Of course, a quick look into the XML document is necessary to discover the required tag names and XPath expressions, such as //div[@class="entrytitle"] to iterate over all the individual posts, or h2/a and h2/a/@href to retrieve the title and URL of each post.
This query produces a nice list of all blog post titles, dates, and URLs. Instead of posting this listing as-is, I decided to roughly group the blog posts along the following topic areas:
* Design
* Performance
* Migration
* Relational to XML or XML to relational
* XML Queries
* XML Storage and Indexing
* XML Schemas
* Application Development
* Use Cases and Case Studies
* ETL and Warehousing
* DB2 pureXML Product News
* Other Products and Tools
* Publications and Resources
* XML Coverage at IDUG and IOD Conferences
* News, Announcements, Miscellaneous
And here is the result, a table of contents for nativexmldatabase.com from 2008 through 2012. I hope this can serve as a quick reference to topics of interest.
Happy browsing!
Good and bad XML design
December 4, 2012
Whenever people design things –such as houses, software systems, database schemas, or even XML structures– there are good and not so good design options. Roughly speaking, a design is usually considered “good” if it allows for easy and (cost-)efficient use and implementation of whatever is being built.
For XML documents we have seen various cases of “good” and “bad” designs over the years. Previously I wrote about name/value pair design for XML documents, which often is not a good choice.
Recently I received a question regarding another design choice, which I would like to share here in a simplified form.
The question was about an XML message format to transmit a list of order numbers and the corresponding order system identifiers. Two alternative XML structures were proposed:
Option A:
<OrderList>
<Order>
<OrderNumber>A0000001</OrderNumber>
<OrderSystem>ABC</OrderSystem>
</Order>
<Order>
<OrderNumber>B0000001</OrderNumber>
<OrderSystem>XP1</OrderSystem>
</Order>
<Order>
<OrderNumber>C0000001</OrderNumber>
<OrderSystem>Q-9</OrderSystem>
</Order>
</OrderList>
Option B:
<OrderList>
<OrderNumber>A0000001</OrderNumber>
<OrderSystem>ABC</OrderSystem>
<OrderNumber>B0000001</OrderNumber>
<OrderSystem>XP1</OrderSystem>
<OrderNumber>C0000001</OrderNumber>
<OrderSystem>Q-9</OrderSystem>
</OrderList>
The motivation for option B might be a smaller message size. One might argue that XML elements are by definition ordered, and if we know that the message consists of pairs of order number and order system, then XML format B is as easy to interpret as format A. Or, is it not?
The problem that I see with option B is that the absence of the <Order> elements gives up the explicit structure of the data and instead requires implicit knowledge, i.e. one must know that the OrderNumber at position i belongs to the OrderSystem at position i+1.
In my view, this design is against the principles of XML. In XML, the tags and their nesting should describe the data explicitly. XML is meant to be a hierarchical data format so that data items that belong together are grouped under the same parent (as in option A). After all, that is the benefit of XML over a flat file format.
I also think that using the explicit structure of option A is less error-prone than using option B which, requires implicit knowledge based on element positions.
How about queryability?
It turns out that XML format A is considerably easier to query than option B. Let’s assume you need to code a SQL/XML query to list the order numbers and systems. For that exercise we assume that the messages are stored in a table “mytable” with XML column “doc”.
The query for XML option A is very simple, using the XMLTABLE function in its most basic form. The row-generating XPath expression $DOC/OrderList/Order iterates over the orders, and the column expressions extract the order number and system.
http://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/
-- Query for XML option A:
SELECT OrderNumber, OrderSystem
FROM mytable,
XMLTABLE ('$DOC/OrderList/Order'
COLUMNS
OrderSystem VARCHAR(20) PATH 'OrderSystem',
OrderNumber VARCHAR(20) PATH 'OrderNumber' );
ORDERNUMBER ORDERSYSTEM
-------------------- --------------------
A0000001 ABC
B0000001 XP1
C0000001 Q-9
3 record(s) selected.
How can you produce the same result set from XML format B? This is remarkably tricky! You need to iterate over the child elements and keep track of which element is at position i versus i+1 to produce the proper pairs in the output.
There are likely several ways to code the query for XML option B, and you see one solution below. It seems hard to query the elements by name alone because the important structural information is by position. Hence, this query iterates over the positions 1 to N, where N is the number of elements. For each even position 2, 4, 6, etc. the query produces the pair of the current ($pos) and the previous ($pos – 1) element to construct the proper output.
-- Query for XML option B:
SELECT OrderNumber, OrderSystem
FROM mytable,
XMLTABLE('for $pos in (1 to count($DOC/OrderList/*))
where $pos mod 2 = 0
return <pair>
{$DOC/OrderList/*[$pos - 1]}
{$DOC/OrderList/*[$pos]}
</pair>'
COLUMNS
OrderNumber VARCHAR(20) PATH 'OrderNumber',
OrderSystem VARCHAR(20) PATH 'OrderSystem' );
OrderSYSTEM OrderNUMBER
-------------------- --------------------
ABC A0000001
XP1 B0000001
Q-9 C0000001
3 record(s) selected.
This query is not only more complex but also potentially less efficient. Note that the return clause of the FLOWR expresion constructs the explicit pairing that it missing in XML format B.
Let me know if you can think of a simpler query to produce the same result from XML format B.
Next, imagine a message that doesn’t have just 2 fields per order but maybe 10 or 100 fields per order. Then option B becomes increasingly more ugly.
Or worse, what if there is a schema change that allows one order to have a different number of fields than the next? Then the query that we coded for XML format B will no longer work because it was based on a specific structural assumption that is no longer true. In contrast, our query for XML format A will work even if you add additional fields to some of the orders in the message.
In summary, the benefits for XML option A include the following:
- Explicit structure rather than implicit positional format
- Easier and more efficient to query (query by name rather query by postion)
- More resilient to schema changes
I think that these benefits far outweigh the concern that XML format A is larger. Also, you could chose to send or store the XML in a compressed format to greatly alleviate space concerns.
Overcoming Performance Obstacles in XML Encryption
October 23, 2012
In times of data leakage, hacker attacks, trojan horses, and various other data security threats, efficient and effective data encryption has becomes a critical requirement for many enterprises. This need applies to XML as much as to most other forms of data.
Intel and IBM have collaborated to demonstrate that a combination of modern hardware and software can perform effective encryption with very high efficiency.
In particular, the latest Intel Xeon E5 and E7 processor families provide AES-NI, which is a set of new instructions in the Intel Xeon processor that help accelerate encryption, decryption, key generation, matrix manipulation, and carry-less multiplication.
A joint benchmark has measured the performance impact of encrypting and decrypting XML data in DB2 9.7 using the IBM InfoSphere Guardium Data Encryption capabilities on the Intel Xeon E5 platform.
The results of the TPoX benchmark show that full encryption can be performed for a read/write XML transaction processing application with less than 4% overhead. This is a fantastic result.
More information on these tests and on the technologies and products used, is available here:
http://software.intel.com/en-us/articles/overcoming-performance-obstacles-in-data-encryption
http://software.intel.com/sites/default/files/m/d/4/1/d/8/IBM_DMM_Intel_AES_Vormetric_FINAL.pdf
http://ibmdatamag.com/2012/05/overcoming-performance-obstacles-in-data-encryption/
XML from Vegas to Berlin!
October 21, 2012
I’m in Las Vegas right now, this glittering and crazy city in Nevada! Las Vegas is once again the location for IBM’s annual Information On Demand (IOD) conference, which is starting today!
IOD is the premier IBM event for Information Management software, including DB2, Informix, Netezza, Cognos, Big Data, SPSS, Content Maagement, and other product areas. The conference program is very diverse and has something for everyone. At IOD you can choose from more than 1500 sessions and 100+ hands-on labs.
Not surprisingly, XML continues to be an important topic for information management, and there are various XML-related sessions at IOD. In particular, I’m looking forward to the following sessions where some of IBM’s customers are sharing their success stories with DB2 pureXML:
Session 3674A:
IBM pureXML in Financial Applications: Experiences From Vanguard
Mon, Oct 22, 201, 11:30 AM – 12:30 PM
Speakers: George White and Milton Beaver, The Vanguard Group
Sessio 1284B:
IBM DB2 and XML: Excellent Opportunity or Extra Problems? Are You Ready?
Mon, Oct 22, 2012, 2:15 PM – 3:15 PM
Speaker: Kurt Struyf, suadasoft
Session 1613A:
How to Get Warehouse-Type Performance With XML Tables
Wed, Oct 24, 2012, 2:30 PM – 3:30 PM
Speaker: Ray Sippel, BJC HealthCare
Additionally, you can get hands-on experience with XML on DB2 for Linux, UNIX, and Windows and DB2 for z/OS in the following hands-on labs:
Lab 1255A
Beyond SQL With IBM DB2 10: Maintaining and Querying XML Data and RDF Stores
Thu, Oct 25, 2012, 1:30 PM – 4:30 PM
Lab 1314A
New Features in IBM DB2 10 for z/OS Improve Development Productivity (including XML features)
Mon, Oct 22, 2012, 10:00 AM – 1:00 PM
If you can’t be in Las Vegas this week (maybe because you’re on the other side of the globe?), how would Berlin work for you?!
The annual European conference of the International DB2 User Group (IDUG) is coming up in Berlin, Germany, on November 4 to 9, 2012. For a number of years there hasn’t been an IDUG conference without any XML sessions, and the same is true this year. Here are some presentations that can you see in Berlin:
“An XML Document’s Life – Dr. Node!“
Speaker: Terri Grissom, BMC Software
Wed, Nov 07, 2012, 11:00 AM – 12:00 PM
“Breaking the Relational Limit with pureXML in DB2 for z/OS”
Speaker: Mengchu Cai, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 01:00 PM – 02:00 PM
“How to Design a Hybrid XML/Relational Database Schema“
Speaker: Matthias Nicola, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 02:15 PM – 03:15 PM
I hope to see you either in Vegas or in Berlin! Enjoy your conference!
How to Validate XML Documents in Database Queries
September 13, 2012
In DB2, validation of XML documents against XML Schemas is optional. If you choose to validate XML documents in DB2, the most typical scenario is to validate XML documents when they are inserted or loaded into the database. This makes sense: if you ensure that the XML that enters the database is valid, then subsequent queries can assume that the data is valid and complies with a particular XML Schema.
Likewise, validation in XML updates statements ensures that document replacement or document modifications do not violate your XML Schema.
Here is a simple example for document validation in INSERT and UPDATE statements, based on an XML Schema that was registered in the DB2 XML Schema Repository (XSR) under the SQL name db2admin.myschema:
CREATE TABLE mytable (id INTEGER NOT NULL PRIMARY KEY, doc XML);
INSERT INTO mytable
VALUES(?, XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema));
UPDATE mytable
SET doc = XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;
UPDATE mytable
SET doc = XMLVALIDATE( XMLQUERY(‘copy $new := $DOC
modify do insert <status>delivered</activated>
into $new/message/header
return $new’)
ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;
There are also cases when you might want to validate XML as part of a query. There can be several reasons for that:
- Documents were inserted or updated without validation and you need to validate them before consumptions.
- You wish to validate XML documents against a different schema than the one was used for validation upon insert or update.
- You are extracting fragments of stored XML documents and wish to validate them against a specific schema.
- Your queries are constructing entirely new XML documents and you wish to vaidate that the constructed XML complies with a given schema.
Regardless of the motivation, XML validation in a query is simple.
You can simply use the XMLVALIDATE function in a SELECT statement. All the same options for XMLVALIDATE are allowed as if you would use it in an INSERT or UPDATE statement. Let’s look at several examples:
SELECT XMLVALIDATE(doc ACCORDING TO XMLSCHEMA ID db2admin.myschema)
FROM mytable
WHERE id = 5;
This query above reads a specific document and performs schema validation against the XML Schema that was registered as db2admin.myschema.
If the selected document is valid for the specified schema, the document is returned.
If the selected document is not valid for the specified schema, the query fails and produces an error code that points to why the schema is violated.
Instead of the XML column name doc, the XMLVALIDATE function can take any argument of type XML, such the result of an XMLQUERY function. The following query uses the XMLQUERY function to extract just the message body from an XML document and validates it against the XML Schema db2admin.msgbodyXSD:
SELECT XMLVALIDATE( XMLQUERY(‘$DOC/message/body’)
ACCORDING TO XMLSCHEMA ID db2admin.msgbodyXSD )
FROM mytable
WHERE id = 5;
The next query constructs a new XML document and validates it as part of the query:
SELECT XMLVALIDATE(
XMLQUERY(‘document{
<newdocument>
<header>{$DOC/party/identity}</header>
<body>
{$DOC/party/name}
{$DOC/party/details/address}
</body>
</newdocument>}’)
ACCORDING TO XMLSCHEMA ID db2admin.newdocXSD)
FROM mytable
WHERE id = 5;
These examples give you an idea of the capabilities for validating XML query results against an XML Schema.
Result set cardinalities when querying repeating XML elements
August 14, 2012
Recently I received some questions about the result sets when querying XML, and especially when querying repeating elements that occur more than once per document.
As it turns out, the same logical result can be returned in different ways, depending on how you write your XQuery or SQL/XML query.
Let’s look at a simple table with two XML documents, and then at several different queries against that data. Here is the sample data:
create table testtable(doc XML);
insert into testtable(doc)
values ('<a id="1">
<b>1</b>
<b>2</b>
</a>');
insert into testtable(doc)
values ('<a id="2">
<b>3</b>
<b>4</b>
<b>5</b>
</a>');
Now assume we want to return all the <b> elements from these two documents. You can write such a query in several different ways, each returning the same <b> elements in a slightly different way:
- XQuery FLWOR expression
- XQuery FLWOR expression within an SQL VALUES clause
- SQL/XML query with the XMLQUERY function
- SQL/XML query with the XMLTABLE function
Let’s look at each of these options in turn.
1. XQuery FLWOR expression
The fist example is a simple XQuery FLWOR expression. It iterates over the path /a/b in all documents and returns the <b> elements one by one. The result is a sequence of 5 elements, and each is returned as a single item in the result set:
xquery
for $b in db2-fn:xmlcolumn("TESTTABLE.DOC")/a/b
return $b';
<b>1</b>
<b>2</b>
<b>3</b>
<b>4</b>
<b>5</b>
5 record(s) selected.
2. XQuery FLWOR expression in an SQL VALUES clause
If you enclose the same FLWOR expression in an SQL VALUES clause then the same XML elements are returned in a different format.
In this example, the VALUES clause produces a single value. The SQL type of that value is the XML data type and the value itself is a sequence of 5 elements. The entire sequence is returned as a single value of type XML:
values(xmlquery('
for $b in db2-fn:xmlcolumn("TESTTABLE.DOC")/a/b
return $b'));
<b>1</b><b>2</b><b>3</b><b>4</b><b>5</b>
1 record(s) selected.
3. SQL/XML query with the XMLQUERY function
You could also write an SQL SELECT statement and include your XQuery or XPath expression in an XMLQUERY function.
Note that the XMLQUERY function is a scalar function, i.e. it returns one result value of type XML for each row that it is applied to. Since our sample table contains two rows, the following query returns two results values of type XML. The first value is a sequence with all the <b> elements from the first document, and the second value is the sequence of all <b> elements from the second document:
SELECT xmlquery('for $b in $DOC/a/b return $b') as col1
FROM testtable;
COL1
----------------------------
<b>1</b><b>2</b>
<b>3</b><b>4</b><b>5</b>
2 record(s) selected.
-- same result with a simple XPath:
SELECT xmlquery('$DOC/a/b') as col1
FROM testtable;
COL1
----------------------------
<b>1</b><b>2</b>
<b>3</b><b>4</b><b>5</b>
2 record(s) selected.
The potential benefit of this result format is that you now exactly which <b> elements came from the same input document. If you prefer to return each <b> element as a separate item, use the XMLTABLE function.
4. SQL/XML query with the XMLTABLE function
The XMLTABLE function is not a scalar function, it’s a table function. This means that it returns a set of result rows for each input document. More precisely, it return one result row for each item that is produced buy the row-generating expression /a/b:
-- return a column of type XML:
SELECT X.*
FROM testtable,
XMLTABLE('$DOC/a/b'
COLUMNS
col1 XML PATH '.') as X;
COL1
----------------------------
<b>1</b>
<b>2</b>
<b>3</b>
<b>4</b>
<b>5</b>
5 record(s) selected.
-- return a column of type integer:
SELECT X.*
FROM testtable,
XMLTABLE('$DOC/a/b'
COLUMNS
col1 INTEGER PATH '.') as X;
COL1
--------
1
2
3
4
5
5 record(s) selected.
The result sets in all of these example make sense and are consistent with SQL semantics. You can chose the shape of your query results and write your queries accordingly.


