XML and BLU

December 24, 2013

As you probably noticed, DB2 10.5 features a new technology called BLU Acceleration which provides significant gains in performance, reduced storage consumption, and simplicity for analytical database workloads. To this end, BLU combines column-organized storage with novel compression techniques, hardware-specific optimizations, data skipping, new techniques for join processing and aggregations, and other innovations.

Since the release of BLU in June 2013, I was asked several times whether BLU can be used with XML or how BLU relates to XML in general.

The short answer is that a BLU table, i.e. a column-organized table, cannot contain columns of type BLOB, CLOB, or XML at this point in time. If you use BLU in a DB2 10.5 database, your XML documents can be stored in the same database but need to reside in traditional tables which are not column-organized.

Fundamentally, XML and BLU follow two very different approaches to managing data.

The concept of XML documents is to keep all attributes for a given business object (e.g. an order, a product, a contract) physically together as one entity so that you can easily retrieve or insert all the relevant information for an object in one shot. This is particularly useful for complex objects whose attributes would otherwise be spread over many relational tables. XML is also useful for objects whose set of attributes can evolve over time, since XML also offers schema flexibility. The main focus of using XML is on the business object as the unit of interest.

In contrast, BLU is designed to store all attributes separately from each other, purposefully breaking up the objects, even more so than traditional storage formats for relational tables. The focus of BLU is on accessing individual attributes across all or many objects as efficiently as possible. The column-organized storage purposefully exploits the fact that many analytical questions never need to see whole objects but only a (small) subset of the attributes. The unit of interest is the individual attribute, not the entire objects.

As such, XML and BLU serve different purposes and access patterns, both valid in their own right.

Analytical queries that need to scan, join, filter, aggregate, etc. on individual attributes across a large number of objects will perform much better with BLU than with XML.

Applications that often need to read or write entire objects, esp. complex objects, perform much better with XML than with BLU. Similarly, queries that typically access most or all columns of a relational table perform better on traditional row-organized tables than on column-organized tables.

Now what if you have subset of attributes in your XML documents that you want to run analytical queries on, ideally across all or many of the documents? In this case you can consider extracting some of these attributes for redundant storage in a BLU table. This can be done with triggers, stored procedures, or application logic.

In a previous post I described how to write a query that produces a list of all elements and attributes that occur in a document or a set of documents. 

The core idea of that technique was to use the XPath expression $doc//(*, @*) in the XMLTABLE function. In this expression, $doc references the XML document or XML column that is used as input. This expression also uses the so-called comma operator that combines two squences into one. These two sequences are are all elements (*) and all attributes (@*).

For this technique to work in DB2 for z/OS, you need to code the XPath expression in a slightly different way:  ($doc//*,  $doc//@*)  , which lists all elements followed by all attributes.

It is useful to sort the result by the parent of each node in the document so that the elements and attributes that belong to the same parent appear in consecutive order in the result set.

The following listing shows two queries that produce the same result set, and the second of the two queries is the recommendad notation for DB2 for z/OS:


SELECT T.*
FROM purchaseorder p,
     XMLTABLE('$doc//(*, @*)' passing p.porder as "doc"
      COLUMNS
       node   VARCHAR(20) PATH 'name(.)',
       parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000
ORDER BY parent;

SELECT T.*
FROM purchaseorder p,
     XMLTABLE('($doc//*, $doc//@*)' passing p.porder as "doc"
      COLUMNS
       node   VARCHAR(20) PATH 'name(.)',
       parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000
ORDER BY parent;

 And then you can ertainly extend such queries to also list the node values or an indication whether the node is an element or an attribute, as previously described here.

Similar considerations apply to queries that list all paths in a document.

 

 

 

This blog on XML database technology and DB2 pureXML has been active for almost 5 years now. My former colleage Conor O’Mahony ran this blog from early 2008 to December 2009, and I took over on Dec 15, 2009. We have received a lot of feedback and many of the blog posts were inspired by questions from XML database users.

I’m not sure yet how much time I will have for blogging in 2013 and beyond. In any case, I liked the idea of producing a summary page that provides a quick overview of all the topics and posts in this blog. But, how to produce such a summary most efficiently?

Well, true to the topic of this blog I decided to load this blog as a piece of XML into DB2 pureXML and run queries on it!

First I downloaded the complete blog as an HTML document. But, HTML is not necessarily wellformed XML because HTML allows elements with missing end tags and other goofy things. Luckily there are free tools available that convert HTML into XHTML. XHTML is a form of HTML that complies with the rules for well-formed XML documents.

After converting the entire blog into an XML (XHTML) document, it took only three steps to create a list of blog post titles, dates, and URLs:

CREATE TABLE nativexmldatabase(doc XML);


LOAD FROM blog.del OF DEL REPLACE INTO nativexmldatabase(doc);


SELECT date, title, url
FROM nativexmldatabase,
   XMLTABLE('$DOC//div[@class="entrytitle"]'
   COLUMNS
     seq     FOR ORDINALITY,
     date    VARCHAR(20)   PATH 'h3',
     title   VARCHAR(105)  PATH 'h2/a',
     url     VARCHAR(145)  PATH 'h2/a/@href') AS T;

Of course, a quick look into the XML document is necessary to discover the required tag names and XPath expressions, such as //div[@class="entrytitle"] to iterate over all the individual posts, or h2/a and h2/a/@href to retrieve the title and URL of each post.

This query produces a nice list of all blog post titles, dates, and URLs. Instead of posting this listing as-is, I decided to roughly group the blog posts along the following topic areas:

* Design
* Performance
* Migration
* Relational to XML or XML to relational
* XML Queries
* XML Storage and Indexing
* XML Schemas
* Application Development
* Use Cases and Case Studies
* ETL and Warehousing
* DB2 pureXML Product News
* Other Products and Tools
* Publications and Resources
* XML Coverage at IDUG and IOD Conferences
* News, Announcements, Miscellaneous

And here is the result, a table of contents for nativexmldatabase.com from 2008 through 2012. I hope this can serve as a quick reference to topics of interest.

Date Design
Dec 4, 2012 Good and bad XML design
Jan 21, 2012 Business Records in the 21st Century
Jan 8, 2012 Data Normalization Reconsidered
Jan 21, 2011 Name/Value Pairs – A pretty bad idea in XML as in Relational !
Jan 8, 2011 XML Manipulation in Application code vs. a Native XML Database
Sep 28, 2010 5 Reasons for Storing XML in a Database
Sep 26, 2008 Flexible Schemas: When to Persist Data in XML Instead of Relational
Aug 4, 2008 When to Store in XML Format
Mar 21, 2008 When to use a Native XML Database
Date Performance
Oct 23, 2012 Overcoming Performance Obstacles in XML Encryption
Nov 15, 2011 TPoX 2.1 has been released!
May 9, 2011 News Flash: Intel publishes TPoX Benchmark results on new 10-core Westmere-EX CPUs
Mar 4, 2011 New TPoX Benchmark Results Available
Jan 8, 2011 XML Manipulation in Application code vs. a Native XML Database
Aug 22, 2010 XML versus Relational Database Performance
July 14, 2010 A 10TB XML Data Warehouse Benchmark
Nov 3, 2008 Benchmark for 1TB Transactional XML System
Mar 20, 2008 Native XML versus CLOB and Shredding
Date Migration
June 29, 2012 How to migrate from the XML Extender to DB2 pureXML
Oct 3, 2010 How to migrate XML from LOB to XML columns
Date Relational to XML or XML to relational
Aug 30, 2010 XML Construction in Views and User-Defined Functions
Aug 24, 2011 How to generate XML from relational data – with line breaks!
June 8, 2011 How to quickly produce XML from relational tables
June 5, 2011 How to insert relational data into existing XML documents (Part 2)
May 21, 2011 How to add relational data into existing XML documents (Part 1)
Date XML Queries
Sep 13, 2012 How to Validate XML Documents in Database Queries
Aug 14, 2012 Result set cardinalities when querying repeating XML elements
Aug 10, 2012 Using the XMLTABLE function with MERGE statements in DB2 for z/OS
July 30, 2012 Using the XMLTABLE function in UPDATE and MERGE statements
May 24, 2012 How to query CDATA sections in XML
Feb 4, 2012 How to list the paths of all elements in an XML document?
Nov 30, 2011 XQuery support in DB2 10 for z/OS
Oct 9, 2011 Advanced SQL/XML: Joins and FLWOR Expressions in the XMLTABLE Function
Aug 29, 2011 Quantified expressions in XQuery: When ‘some’ and ‘every’ satisfy!
Mar 18, 2011 XQuery and SQL/XML: How to convert a date that is not a date into a date?
Aug 11, 2010 How to order query results based on XML values
May 26, 2010 XML Profiling – How to get a list of all elements and attributes
May 9, 2010 How to deal with dirty data
July 24, 2011 XQuery and Other XML Manipulation in SQL Stored Procedures
April 6, 2011 Returning data from one path based on predicates in another (part 2/2)
April 3, 2011 Intra-document references: Returning data from one path depending on predicates in another path
June 17, 2010 An XQuery Cheat Sheet
April 12, 2010 The XMLTABLE function and a case where a left outer join can help
Feb 21, 2010 Get Started with XML Full-Text Search in DB2
Jan 25, 2010 tokenize it !
Jan 11, 2010 Watch your step! Positional predicates in XPath.
Oct 14, 2008 XQuery: Introduction, Tips, and Future Directions
July 25, 2008 XQuery versus SQL/XML – Part Deux
June 26, 2008 XQuery versus SQL/XML
Date XML Storage and Indexing
July 11, 2012 XML Storage in DB2: To inline or not to inline?
June 8, 2012 XML Indexing in DB2 9.x and DB2 10 for Linux, UNIX, and Windows
July 11, 2011 How to index XML documents in the presence of schema diversity?
July 19, 2010 Structure-agnostic XML indexes in DB2
Dec 26, 2010 How to insert XML documents from the file system?
Feb 21, 2010 Get Started with XML Full-Text Search in DB2
May 6, 2009 DB2 Compresses XML Data by 60% to 80%
May 3, 2009 Multi-Dimensional Clustering for Tables with Native XML Data
May 1, 2009 Use Hash Partitioning for Fast Analysis of XML Data
Oct 16, 2008 DBA Concerns about Native XML Storage
April 8, 2008 Native XML Reduces Storage Costs
Date XML Schemas
Sep 13, 2012 How to Validate XML Documents in Database Queries
Feb 29, 2012 What is an XML Schema and why should I care?
Aug 22, 2008 Schema Evolution
Aug 21, 2008 Schema Flexibility
Date Application Development
Sep 29, 2011 At your service: REST with DB2 pureXML!
Aug 8, 2011 Using XQuery across Application Server and Database Server
July 30, 2011 Beyond the Database: Native XML and XQuery in the Application Server
Feb 25, 2011 CICS meets XML: Using CICS Web Servives with DB2 pureXML
Jan 8, 2011 XML Manipulation in Application code vs. a Native XML Database
Oct 8, 2010 Buidling XML Solutions with Rational Software Architect (RSA) and Rational Application Dev
Mar 1, 2010 XML Support in JDBC 4.0: The SQLXML Interface
Nov 9, 2009 Building a JSON and DB2 pureXML Application
Aug 5, 2008 Native XML Storage Reduces Development Costs
Nov 18, 2008 Referential Integrity and XML Data
Date XML Use Cases and Case Studies
Sep 12, 2011 DB2 pureXML – Rich in Proteins!
June 24, 2011 A true story about using XML forms and DB2 to process millions of tax returns
May 12, 2010 Webcast on XML Use Cases (May 18)
Feb 7, 2010 BJC HealthCare Improves Clinical Research with DB2 pureXML
Nov 2, 2009 Electronic Forms Using Adobe PDF and IBM DB2 pureXML
Sep 8, 2009 The Patient-Centered Medical Home
Aug 24, 2009 Electronic Health Records for Smarter Healthcare
July 17, 2009 Article about XML for Healthcare
June 16, 2009 Building an XML-Based Electronic Forms Solution
June 5, 2009 Short Video Highlighting Early Experiences with DB2 9.7
Mar 30, 2009 GUS Group uses DB2 pureXML for ERP System
Mar 13, 2009 UCLA Health System uses XML and SOA to Optimize Data Integration and Migration
Mar 2, 2009 Storing and Retrieving ACORD Data for Insurance
Feb 26, 2009 Speedy Communications and Collaboration in XML-based Environments
Dec 30, 2008 XForms and Native XML Storage
Mar 27, 2008 How to Choose Solution Areas
Mar 25, 2008 More information about Storebrand
Mar 23, 2008 Productivity Improvements at Storebrand
Date ETL and Warehousing
April 13, 2011 XML ETL with DataStage 8.5
July 14, 2010 A 10TB XML Data Warehouse Benchmark
June 23, 2010 More on Data Warehousing and XML
April 5, 2010 More on XML ETL
Mar 26, 2010 XML ETL
Jan 31, 2010 New Article on XML in the Data Warehouse
May 28, 2009 XML in the Data Warehouse
April 22, 2009 Fast Performance for Native XML Data in the Warehouse
Aug 24, 2009 Article on Integrating XML Operational Data into a Data Warehouse
Date DB2 pureXML product news
April 19, 2012 XML Enhancements in DB2 10 for Linux, UNIX, and Windows
Nov 30, 2010 Node-level XML Updates in DB2 10 for z/OS
Nov 18, 2010 New XML Features in DB2 10 for z/OS !
June 8, 2010 Updated version of a free XML database: DB2 Express-C 9.7.2
Aug 7, 2009 SOA Projects: IBM DB2 versus Oracle Database
Feb 11, 2009 XML Database in the Cloud
Feb 10, 2009 pureXML Moves into Core DB2 Product
Date Other Products and Tools
Dec 5, 2010 XML Tools for DB2 pureXML
Mar 8, 2010 Integration of Altova XMLSpy with DB2 pureXML
Jan 18, 2010 XPath and XQuery Functions in SQL Server vs. DB2
Jan 4, 2010 Shredding and constructing XML in mySQL
Dec 23, 2009 XML Support in mySQL
Dec 17, 2009 Oracle vs. mySQL: The Full Monty
Aug 7, 2009 MyCareTeam and IBM Collaborate to Improve Continuing Care for Diabetes
July 8, 2009 Why Won’t Oracle Publish XML Benchmark Results for TPoX?
Sep 30, 2008 XML in Oracle 11g
Sep 24, 2008 XML in SQL Server 2008
Sep 19, 2008 XML in SQL Server 2005
July 11, 2008 Questions for XML Database Vendors
May 13, 2008 XML-only Databases
May 12, 2008 All Native XML Databases are not Equal
Date Publications and Resources
Feb 13, 2011 Extremely pureXML in DB2 10 for z/OS
Sep 14, 2010 DB2 pureXML Cookbook – Errata
Aug 13, 2009 DB2 pureXML Cookbook – 45% Discount
June 3, 2009 DB2 pureXML for Dummies . Get Your Copy!
April 29, 2010 Get off to a fast start with DB2 pureXML
April 10, 2009 DB2 pureXML Cookbook
Feb 25, 2009 Intel Insights Magazine covers Native XML Storage
Feb 5, 2009 Henrik’s Blog
Feb 5, 2009 DB2 pureXML leads SOA World Magazine Reader’s Choice Award
Jan 21, 2009 Podcast Series about DB2 pureXML on the Mainframe
Jan 20, 2009 Reports for Information in Native XML Format
Dec 30, 2008 Anant Jhingran on Improving XML Applications
Sep 1, 2008 DB2 pureXML Online Communities
July 27, 2008 Getting Started with Native XML Databases
Date XML Coverage at IDUG and IOD Conferences
Oct 21, 2012 XML from Vegas to Berlin!
Oct 18, 2011 XML in Las Vegas !
April 25, 2011 XML sessions at IDUG North America 2011
Oct 21, 2010 Next week in Vegas
April 18, 2010 DB2 for z/OS pureXML Hands-on Lab @ IDUG North America
Oct 23, 2009 DB2 pureXML at the IOD Conference
May 28, 2009 IBM IOD Conference Attendees get a Poken for Networking
May 13, 2009 Short Video from the IDUG North America Confernece
Nov 11, 2008 XML at IBM Information on Demand Conference
Sep 11, 2008 DB2 pureXML at the IOD Conference
Sep 4, 2008 Meet Native XML Databases Users
Date News, Announcements, Miscellaneous
July 30, 2010 DB2 Express-C Light – A reduced footprint XML database
Mar 15, 2010 More DB2 pureXML Bootcamps
Feb 14, 2010 DB2 pureXML Webinars and Bootcamps
Dec 15, 2009 Hi there here is the new guy!
Dec 13, 2009 Welcome Matthias Nicola!
June 9, 2009 Flirting with Poken
April 22, 2009 The XML Rap Superstar
April 22, 2009 IDUG announces winners of XML Challenge
April 20, 2009 Teaser for Upcoming DB2 pureXML Features
Mar 12, 2009 DB2 and Cloud Computing – Chat with the Labs
Feb 5, 2009 Free Online Conference – Data in Action
Jan 20, 2009 DB2 Technology Sandbox
Dec 26, 2008 Native XML Storage for the Mac
Dec 3, 2008 Update on the XML Challenge
Nov 5, 2008 Why IBM DB2 is ideal for Transactional XML
Oct 9, 2008 10 Reasons why DBAs Should Understand Native XML
Oct 2, 2008 XML Challenge Web Site is Live!
Sep 28, 2008 1st Oct is Online Community Action Day
Aug 7, 2008 Webcast: A primer for storing and retrieving XML data
July 1, 2008 Viral Video – Wednesday
June 20, 2008 Viral Video – Tuesday
June 16, 2008 Viral Video – Monday
June 13, 2008 Learn to Use XML with Databases and win Prizes!
Mar 22, 2008 How to Market New Technology
Mar 19, 2008 Welcome

Happy browsing!

Good and bad XML design

December 4, 2012

Whenever people design things –such as houses, software systems, database schemas, or even XML structures– there are good and not so good design options.  Roughly speaking, a design is usually considered “good” if it allows for easy and (cost-)efficient use and implementation of whatever is being built.

For XML documents we have seen various cases of “good” and “bad” designs over the years. Previously I wrote about name/value pair design for XML documents, which often is not a good choice.

Recently I received a question regarding another design choice, which I would like to share here in a simplified form.

The question was about an XML message format to transmit a list of order numbers and the corresponding order system identifiers. Two alternative XML structures were proposed:

Option A:

<OrderList>
  <Order>
    <OrderNumber>A0000001</OrderNumber>
    <OrderSystem>ABC</OrderSystem>
  </Order>
  <Order>
    <OrderNumber>B0000001</OrderNumber>
    <OrderSystem>XP1</OrderSystem>
  </Order>
  <Order>
    <OrderNumber>C0000001</OrderNumber>
    <OrderSystem>Q-9</OrderSystem>
  </Order>
</OrderList>

Option B:

<OrderList>
    <OrderNumber>A0000001</OrderNumber>
    <OrderSystem>ABC</OrderSystem>
    <OrderNumber>B0000001</OrderNumber>
    <OrderSystem>XP1</OrderSystem>
    <OrderNumber>C0000001</OrderNumber>
    <OrderSystem>Q-9</OrderSystem>
</OrderList>

The motivation for option B might be a smaller message size. One might argue that XML elements are by definition ordered, and if we know that the message consists of pairs of order number and order system, then XML format B is as easy to interpret as format A. Or, is it not?

The problem that I see with option B is that the absence of the <Order> elements gives up the explicit structure of the data and instead requires implicit knowledge, i.e. one must know that the OrderNumber at position i belongs to the OrderSystem at position i+1.

In my view, this design is against the principles of XML. In XML, the tags and their nesting should describe the data explicitly. XML is meant to be a hierarchical data format so that data items that belong together are grouped under the same parent (as in option A). After all, that is the benefit of XML over a flat file format.

I also think that using the explicit structure of option A is less error-prone than using option B which, requires implicit knowledge based on element positions.

How about queryability?

It turns out that XML format A is considerably easier to query than option B. Let’s assume you need to code a SQL/XML query to list the order numbers and systems. For that exercise we assume that the messages are stored in a table “mytable” with XML column “doc”.

The query for XML option A is very simple, using the XMLTABLE function in its most basic form. The row-generating XPath expression $DOC/OrderList/Order iterates over the orders, and the column expressions extract the order number and system.

http://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/

-- Query for XML option A:

SELECT OrderNumber, OrderSystem
FROM mytable,
     XMLTABLE ('$DOC/OrderList/Order'
       COLUMNS
         OrderSystem VARCHAR(20) PATH 'OrderSystem',
         OrderNumber VARCHAR(20) PATH 'OrderNumber' );

ORDERNUMBER          ORDERSYSTEM
-------------------- --------------------
A0000001             ABC
B0000001             XP1
C0000001             Q-9

3 record(s) selected.

How can you produce the same result set from XML format B? This is remarkably tricky! You need to iterate over the child elements and keep track of which element is at position i versus i+1 to produce the proper pairs in the output.

There are likely several ways to code the query for XML option B, and you see one solution below. It seems hard to query the elements by name alone because the important structural information is by position. Hence, this query iterates over the positions 1 to N, where N is the number of elements. For each even position 2, 4, 6, etc. the query produces the pair of the current ($pos) and the previous ($pos – 1) element to construct the proper output.

-- Query for XML option B:

SELECT OrderNumber, OrderSystem
FROM mytable,
     XMLTABLE('for $pos in (1 to count($DOC/OrderList/*))
               where $pos mod 2 = 0
               return  <pair>
                         {$DOC/OrderList/*[$pos - 1]}
                         {$DOC/OrderList/*[$pos]}
                       </pair>'
       COLUMNS
         OrderNumber VARCHAR(20) PATH 'OrderNumber',
         OrderSystem VARCHAR(20) PATH 'OrderSystem' );

OrderSYSTEM          OrderNUMBER
-------------------- --------------------
ABC                  A0000001
XP1                  B0000001
Q-9                  C0000001

3 record(s) selected.

This query is not only more complex but also potentially less efficient. Note that the return clause of the FLOWR expresion constructs the explicit pairing that it missing in XML format B.

Let me know if you can think of a simpler query to produce the same result from XML format B.

Next, imagine a message that doesn’t have just 2 fields per order but maybe 10 or 100 fields per order. Then option B becomes increasingly more ugly.

Or worse, what if there is a schema change that allows one order to have a different number of fields than the next? Then the query that we coded for XML format B will no longer work because it was based on a specific structural assumption that is no longer true. In contrast, our query for XML format A will work even if you add additional fields to some of the orders in the message.

In summary, the benefits for XML option A include the following:

  • Explicit structure rather than implicit positional format
  • Easier and more efficient to query (query by name rather query by postion)
  • More resilient to schema changes

I think that these benefits far outweigh the concern that XML format A is larger. Also, you could chose to send or store the XML in a compressed format to greatly alleviate space concerns.

In times of data leakage, hacker attacks, trojan horses, and various other data security threats, efficient and effective data encryption has becomes a critical requirement for many enterprises. This need applies to XML as much as to most other forms of data.

Intel and IBM have collaborated to demonstrate that a combination of modern hardware and software can perform effective encryption with very high efficiency.

In particular, the latest Intel Xeon E5 and E7 processor families provide AES-NI, which is a set of new instructions in the Intel Xeon processor that help accelerate encryption, decryption, key generation, matrix manipulation, and carry-less multiplication.

A joint benchmark has measured the performance impact of encrypting and decrypting XML data in DB2 9.7 using the IBM InfoSphere Guardium Data Encryption capabilities on the Intel Xeon E5 platform.

The results of the TPoX benchmark show that full encryption can be performed for a read/write XML transaction processing application with less than 4% overhead. This is a fantastic result.

More information on these tests and on the technologies and products used, is available here:

http://software.intel.com/en-us/articles/overcoming-performance-obstacles-in-data-encryption

http://software.intel.com/sites/default/files/m/d/4/1/d/8/IBM_DMM_Intel_AES_Vormetric_FINAL.pdf

http://ibmdatamag.com/2012/05/overcoming-performance-obstacles-in-data-encryption/

XML from Vegas to Berlin!

October 21, 2012

I’m in Las Vegas right now, this glittering and crazy city in Nevada! Las Vegas is once again the location for IBM’s annual Information On Demand (IOD) conference, which is starting today!

IOD is the premier IBM event for Information Management software, including DB2, Informix, Netezza, Cognos, Big Data, SPSS, Content Maagement, and other product areas. The conference program is very diverse and has something for everyone. At IOD you can choose from more than 1500 sessions and 100+ hands-on labs.

Not surprisingly, XML continues to be an important topic for information management, and there are various XML-related sessions at IOD. In particular, I’m looking forward to the following sessions where some of IBM’s customers are sharing their success stories with DB2 pureXML:

Session 3674A:
IBM pureXML in Financial Applications: Experiences From Vanguard
Mon, Oct 22, 201, 11:30 AM – 12:30 PM
Speakers: George White and Milton Beaver, The Vanguard Group

Sessio 1284B:
IBM DB2 and XML: Excellent Opportunity or Extra Problems? Are You Ready?
Mon, Oct 22, 2012, 2:15 PM – 3:15 PM
Speaker: Kurt Struyf, suadasoft

Session 1613A:
How to Get Warehouse-Type Performance With XML Tables
Wed, Oct 24, 2012, 2:30 PM – 3:30 PM
Speaker: Ray Sippel, BJC HealthCare

Additionally, you can get hands-on experience with XML on DB2 for Linux, UNIX, and Windows and DB2 for z/OS in the following hands-on labs:

Lab 1255A
Beyond SQL With IBM DB2 10: Maintaining and Querying XML Data and RDF Stores
Thu, Oct 25, 2012, 1:30 PM – 4:30 PM

Lab 1314A
New Features in IBM DB2 10 for z/OS Improve Development Productivity (including XML features)
Mon, Oct 22, 2012, 10:00 AM – 1:00 PM

If you can’t be in Las Vegas this week (maybe because you’re on the other side of the globe?), how would Berlin work for you?!

The annual European conference of the International DB2 User Group (IDUG) is coming up in Berlin, Germany, on November 4 to 9, 2012. For a number of years there hasn’t been an IDUG conference without any XML sessions, and the same is true this year. Here are some presentations that can you see in Berlin:

An XML Document’s Life – Dr. Node!
Speaker: Terri Grissom, BMC Software
Wed, Nov 07, 2012, 11:00 AM – 12:00 PM

“Breaking the Relational Limit with pureXML in DB2 for z/OS”
Speaker: Mengchu Cai, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 01:00 PM – 02:00 PM

“How to Design a Hybrid XML/Relational Database Schema
Speaker: Matthias Nicola, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 02:15 PM – 03:15 PM

I hope to see you either in Vegas or in Berlin! Enjoy your conference!

In DB2, validation of XML documents against XML Schemas is optional. If you choose to validate XML documents in DB2, the most typical scenario is to validate XML documents when they are inserted or loaded into the database. This makes sense: if you ensure that the XML that enters the database is valid, then subsequent queries can assume that the data is valid and complies with a particular XML Schema.

Likewise, validation in XML updates statements ensures that document replacement or document modifications do not violate your XML Schema.

Here is a simple example for document validation in INSERT and UPDATE statements, based on an XML Schema that was registered in the DB2 XML Schema Repository (XSR) under the SQL name db2admin.myschema:

CREATE TABLE mytable (id INTEGER NOT NULL PRIMARY KEY, doc XML);

INSERT INTO mytable
VALUES(?, XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema));

UPDATE mytable
SET doc = XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;

UPDATE mytable
SET doc = XMLVALIDATE( XMLQUERY(‘copy $new := $DOC
                                 modify do insert <status>delivered</activated>
                                           into $new/message/header
                                 return $new’)
             ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;

There are also cases when you might want to validate XML as part of a query. There can be several reasons for that:

  •  Documents were inserted or updated without validation and you need to validate them before consumptions.
  •  You wish to validate XML documents against a different schema than the one was used for validation upon insert or update.
  •  You are extracting fragments of stored XML documents and wish to validate them against a specific schema.
  •  Your queries are constructing entirely new XML documents and you wish to vaidate that the constructed XML complies with a given schema.

Regardless of the motivation, XML validation in a query is simple.

You can simply use the XMLVALIDATE function in a SELECT statement. All the same options for XMLVALIDATE are allowed as if you would use it in an INSERT or UPDATE statement. Let’s look at several examples:

SELECT XMLVALIDATE(doc ACCORDING TO XMLSCHEMA ID db2admin.myschema)
FROM mytable
WHERE id = 5;

This query above reads a specific document and performs schema validation against the XML Schema that was registered as db2admin.myschema.
If the selected document is valid for the specified schema, the document is returned.
If the selected document is not valid for the specified schema, the query fails and produces an error code that points to why the schema is violated.

Instead of the XML column name doc, the XMLVALIDATE function can take any argument of type XML, such the result of an XMLQUERY function. The following query uses the XMLQUERY function to extract just the message body from an XML document and validates it against the XML Schema db2admin.msgbodyXSD:

SELECT XMLVALIDATE( XMLQUERY(‘$DOC/message/body’)
            ACCORDING TO XMLSCHEMA ID db2admin.msgbodyXSD )
FROM mytable
WHERE id = 5;

The next query constructs a new XML document and validates it as part of the query:

SELECT XMLVALIDATE(        
         XMLQUERY(‘document{
                    <newdocument>
                      <header>{$DOC/party/identity}</header>
                      <body>
                          {$DOC/party/name}
                          {$DOC/party/details/address}
                      </body>
                   </newdocument>}’)
       ACCORDING TO XMLSCHEMA ID db2admin.newdocXSD)
FROM mytable
WHERE id = 5;

These examples give you an idea of the capabilities for validating XML query results against an XML Schema.

 

Follow

Get every new post delivered to your Inbox.

Join 50 other followers