In my previous post I talked about XML in the data warehouse. While using XML in a data warehouse may seem like a very novel idea to some database professionals, it is already a reality for others.

One example can be found at BJC Healthcare, one of the largest non-profit health care organizations in the USA. To give you an idea of the size of their operation: they run 13 hospitals, multiple community health facilities, and they have more than 26,000 employees and a net revenue of $3.2 billion. Their challenges and goals included improving the data collection required for grant applications, identifying suitable patients for medical studies, and improving patient treatment. The IT solution had to protect patient privacy and provide adequate performance and scalability to handle the increasing amounts of data.

Given the diverse and evolving nature of medical data, BJC decided to use XML as the format for patient medical records, lab results, and other clinical information. This decision provides two key benefits to their application. First, XML provides the flexibility that is required to handle variable data and future schema evolution. Second, the use of XML has allowed BJC to design a simple and intuitive database schema with less than 10 tables. Storing the same information in a fully relational database schema would have required over 100 tables and many queries would have to join 20 or more tables, which is complex and can often be inefficient.

At the beginning of the project, BJC evaluated DB2 as well as other databases that support XML data. BJC chose DB2 for a variety of reasons, including ease of use of the pureXML features in DB2 as well as a very short time to get up and running with a prototype.

The DB2 pureXML database at BJC currently holds about 2.5 terabytes worth of medical records in XML format. The browser-based application interface allows users to define complex analytical search requests that are automatically translated into XQuery and SQL/XML statements and submitted to DB2.

Read the full BJC case study for more details.

IBM has released a new white paper on use cases and benefits of using XML as a data format in a data warehouse:

XML: Changing the data warehouse
ftp://ftp.software.ibm.com/common/ssi/sa/wh/n/imw14238usen/IMW14238USEN.PDF

The article describes cases where using XML in a warehouse can be beneficial and improve the flexibility and evolution of a data warehouse schema to react to new requirements in a simple and cost-effective manner. Typically this is not a question of either XML or relational, but a question of using the right mix of both.

I think there are at least three use cases for using DB2’s pureXML features in a warehousing environment:

  1. You have or receive XML documents and you want to use them in a warehouse (for reporting queries, etc.) without shredding them to relational tables. You will typically use SQL/XML queries, possibly plugged into reporting tools such as Cognos, or you can use relational views over the XML data.
  2. You can add XML columns to existing relational dimension tables to make these tables more flexible, especially when the number of dimensional attributes keeps increasing as a company keeps adding new products and services over time. The above-mentioned white paper elaborates on this use case.
  3. In some cases you may prefer to shred incoming XML documents partially or entirely to a relational warehouse schema. In that case you will use the pureXML features too, such as XMLTABLE functions to accomplish the shredding or even simple ETL-ish tasks. However, not all types of XML lend themselves to shredding, in which case you’re back to point 1. above.

If you want to dive deeper into the technical features in DB2 that support XML in a data warehouse, such as partitioning, clustering, and compression, you may want to look at the following article:

Enhance business insight and scalability of XML data with new DB2 9.7 pureXML features
http://www.ibm.com/developerworks/data/library/techarticle/dm-0904db297purexml/

A common question related to XML in a data warehouse is how to use reporting tools with XML data, or with a mix of XML and relational data. Here is an article that describes how to do this with Cognos on top of DB2 pureXML:

Create business reports for XML data with Cognos 8 BI and DB2 pureXML http://www.ibm.com/developerworks/data/library/techarticle/dm-0811saracco/index.html



And to round off the data warehouse topic, there is a two-part article on IBM InfoSphere DataStage and DB2 pureXML:
http://www.ibm.com/developerworks/data/library/techarticle/dm-0908datastagepurexml1/
http://www.ibm.com/developerworks/data/library/techarticle/dm-0909datastagepurexml2/

tokenize it !

January 25, 2010

In my previous post I wrote about XPath and XQuery functions in DB2 vs. SQL Server 2008. One particularly powerful function is tokenize. The tokenize function takes a string as input and splits it into several substrings based on a repeating delimiter. The delimiter can be a single character or a pattern defined by a regular expression. The tokenize function is available in DB2 for z/OS (here) and DB2 for Linux, UNIX, and Windows (here).   

Here is a very simple example where the string “1-3-26-43-74″ is split along the character “-”. This eample is executed in the Command Line Processor of DB2 for Linux, UNIX, and Windows:   


db2 => xquery tokenize("1-3-26-43-74", "-");

1
3
26
43
74
 
  5 record(s) selected.
 
db2 =>

You might sometimes see that XPath function names are prefixed with “fn:”, which is the namespace prefix for functions. However, the namespace for XPath functions is implicitly predeclared so that you can omit the prefix “fn:”. That is, you can simply use the function name tokenize instead of fn:tokenize, as I did above. 

Here is another example where the tokenize function comes in handy.  In this example an XML document describes (some of) the environment variables of a user account on a Linux machine. Some enviroment variables, such as PATH, CLASSPATH, etc. contain lists of paths delimited by the colon (:) sign.   


create table myenv(doc XML);

insert into myenv values('
<ENVIRONMENT os="Linux">
   <HOSTNAME>hermes</HOSTNAME>
   <PATH>/home/mnicola/bin:/usr/bin:/usr/local/bin:/opt/newapp/bin:
         /home/mnicola/sqllib/bin:/home/mnicola/sqllib/adm:
         /home/mnicola/sqllib/misc:/home/mnicola/sqllib/int</PATH>
   <MANPATH>/usr/local/man:/usr/share/man:/usr/X11R6/man</MANPATH>
   <CLASSPATH>.:/home/mnicola/sqllib/java/db2java.zip:/home/mnicola/sqllib/java/db2jcc.jar:
              /home/mnicola/sqllib/java/sqlj.zip:/home/mnicola/sqllib/java/runtime.zip:/test</CLASSPATH>
</ENVIRONMENT>');
 

Then you can use the following simple query in DB2 to read and tokenize the CLASSPATH and return each path on a separate row:

xquery
db2-fn:xmlcolumn("MYENV.DOC")/ENVIRONMENT[HOSTNAME="hermes"]/tokenize(CLASSPATH, ":");
 
--------------------------------
.
/home/mnicola/sqllib/java/db2java.zip
/home/mnicola/sqllib/java/db2jcc.jar
/home/mnicola/sqllib/java/sqlj.zip
/home/mnicola/sqllib/java/runtime.zip
/test
 
  6 record(s) selected.
 

To be precise, the tokenize function returns a sequence of strings. On a sequence you can apply positional predicates to pick selected items from the sequence. The following query uses the tokenize function with positional predicates [1], [2], and [3] to obtain the first, second, and third entry from the MANPATH enviroment variable: 


SELECT XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[1] '
                 passing doc as "d") as path1,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[2] '
                 passing doc as "d") as path2,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[3] '
                 passing doc as "d") as path3
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' PASSING doc as "d");
 

PATH1                PATH2                PATH3
-------------------- -------------------- --------------------
/usr/local/man       /usr/share/man       /usr/X11R6/man
 
 1 record(s) selected.
 

In the next example the XPath function count is wrapped around tokenize to obtain the number of entries of the CLASSPATH.


SELECT XMLQUERY('$d/ENVIRONMENT/count(tokenize(CLASSPATH, ":")  )' 
                 passing doc as "d")  as length_classpath
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' passing doc as "d");

LENGTH_CLASSPATH 
-----------------------------
6 

 1 record(s) selected.
 

I’m sure you will find many more use cases where the tokenize function is helpful!

XQuery is a feature-rich language that allows you to code sophisticated queries against XML data. As you probably know, the XPath language is a subset of XQuery. XPath and XQuery share a common set of built-in functions that are defined by the W3C specification titled “XQuery 1.0 and XPath 2.0 Functions and Operators“. This spec describes dozens and dozens of functions that deal with strings, numbers, dates, timestamps, XML nodes and sequences, and so on.

The DB2 support for the XPath and XQuery functions is documented in the DB2 Information Center, and you might find corresponding information for other database systems on their respective web sites. However, other databases might not support as many of the XQuery functions as DB2. For example, DB2 9.7 for Linux, UNIX, and Windows supports 21 XQuery string functions while SQL Server 2008 seems to have just six. DB2 also supports over 30 XQuery date and time functions, SQL Server seems to support none of those.

Admittedly, some functions are more commonly used then others. XQuery functions such as “substring” or “count” are used quite often. In contrast, the XQuery functions “reverse” and “string-to-codepoints” tend to be used much less frequently.

But then there are some functions that you might not use every day, but when you do need them you really don’t want to miss them. For example, sometimes you may have to
• search for strings that start with a given prefix (starts-with)
• determine whether a string matches a given pattern (matches)
• remove unnecessary spaces from a string (normalize-space)
• compute the absolute value of a number (abs)
• convert a timestamp to a different time zone (adjust-dateTime-to-timezone)
• decide whether two XML documents are identical in all their structure and values (deep-equal)

These and other tasks are common enough. Yes, you can write application code to perform such tasks, but that is much less convenient and less efficient than using a built-in function that’s readily available. Being aware of the available built-in functions can make your life much easier when you develop XML applications. The functions listed above are available in DB2 pureXML, but these and many others don’t seem to be supported in SQL Server 2008.

Some time ago I received an interesting question about how to code a specific XPath expression. I think the question and the answer are worthwhile sharing here. 

The question: 

Consider the following XML document. The document describes an article that has a title, a publication date, and several sections. Each section has a body, and some sections also contain a reference to a figure. Admittedly, I contrived this document to be able to explain this issue in a simple way. 

 <article id="62083">
  <title>A short story about apple pie</title>
  <pubdate>2009-08-21T09:30:00</pubdate>
  <section>
      <body>Once upon a time...</body>
  </section>
  <section>
      <body>...</body>
      <figure>drawing15.jpg</figure>
  </section>
  <section>
      <body>...</body>
      <figure>screenshot85.tif</figure>
  </section>
  <section>
      <body>...</body>
  </section>
</article>

Now the question is: how can I retrieve the first or the last <figure> from the article? The answer is slightly more tricky than what you might think! You probably know that XPath allows for positional predicates that choose elements based on the order in which they appear. For example, if the variable $doc holds the document shown above, then the following XPath expressions retrieve the first and the last <section> of the article, respectively:

$doc/article/section[1]
$doc/article/section[last()]

But, the question was how to retrieve the first or last <figure>. Intuitively, one might try the following XPath expression to obtain the first <figure> element:

$doc/article/section/figure[1]

However, this XPath returns both <figure> elements from the document:    

    <figure>drawing15.jpg</figure>
    <figure>screenshot85.tif</figure> 

This result may seem unexpected, but it is perfectly correct. The XPath expression has asked for the first figure within each section, and that’s exactly what we got: the first figure from each section. It’s just not what we wanted.   

The solution:

There are several ways to retrieve the first <figure> of the entire document rather than the first <figure> of each section. One approach is to use parentheses in order to apply the positional predicate [1] to the sequence of all <figure> elements, rather than to the figures within each section:

($doc/article/section/figure)[1]

Another approach is to navigate to the first section that actually contains a <figure>, and then to return the first <figure> element from that section:

$doc/article/section[figure][1]/figure[1]

Note that the XPath step section[figure][1] uses two predicates. The predicate [figure] selects all sections that contain a figure, and the predicate [1] selects the first of those sections. Similarly, you can use either one of the following two XPath expressions to retrieve the last <figure> element from our sample document:

($doc/article/section/figure)[last()]
$doc/article/section[figure][last()]/figure[last()]

With the attached script you can try it out yourself – in DB2 for Linux, UNIX, and Windows or DB2 for z/OS. (Click on the “source” link below.)

 

create table mytest(mydoc XML);
 

insert into mytest(mydoc) values('<article id="62083">
 <title>A short story about apple pie</title>
 <pubdate>2009-08-21T09:30:00</pubdate>
 <section>
  <body>Once upon a time...</body>
 </section>
 <section>
  <body>...</body>
  <figure>drawing15.jpg</figure>
 </section>
 <section>
  <body>...</body>
  <figure>screenshot85.tif</figure>
 </section>
 <section>
  <body>...</body>
 </section>
</article>');
 

-- SQL/XML (DB2 for z/OS and DB2 for Linux, UNIX, and Windows):

SELECT XMLQUERY('$doc/article/section/figure[1]'
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY('$doc/article/section/figure[last()]' 
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY(' ($doc/article/section/figure)[1]'
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY(' ($doc/article/section/figure)[last()]'
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY('$doc/article/section[figure][1]/figure[1]'
                 PASSING mydoc as "doc")
FROM mytest;
 

-- XQuery (DB2 for Linux, UNIX, and Windows only):

xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure[1];
xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure[last()];
xquery (db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure)[1];
xquery (db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure)[last()];
xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section[figure][1]/figure[1];
xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section[figure][last()]/figure[last()];

In my previous post I have started discussing the XML features in mySQL and I have pointed out some of the key differences to DB2 pureXML. To complete this topic, let’s look at shredding and constructing XML data in mySQL.

Shredding with the LOAD_XML statement

The LOAD_XML command in mySQL enables you to convert XML to relational rows and to insert these rows into a table (shredding). The key restriction is that the input XML format has to be flat and regular so that the elements trivially map to a relational target table. The column names in the target table must match the tag names in the XML data. For example, if you want to shred name information to relational columns that are called “firstname” and “lastname”, then the XML input must have one of the following three forms:

–option 1:

<row firstname="John" lastname="Doe"/>

–option 2:

<row>
     <firstname>John</firstname>
     <lastname>Doe</lastname>
</row>

–option 3:

<row>
     <field name='firstname'>John</field>
     <field name='lastname'>Doe</field>
</row>

My experience is that in those XML applications where shredding is feasible, the tag names in the incoming XML data are typically not identical to the column names of the target tables. Hence, the LOAD_XML command seems too restrictive for many use cases. The shredding features in DB2 (described in the sample chapter of the DB2 pureXML Cookbook) don’t have any such restrictions and allow arbitrary mappings from XML to relational. Of course, in DB2 you may also prefer the use of XML columns instead of shredding.

Exporting relational data to a fixed XML format

The mysql and mysqldump clients support the command line option

--xml

With this option, any relational result set is tagged up in a default XML format. One <row> element is produced for each row of data, and one <field> element for each column in the result set. This is a fixed XML format and I didn’t find a way to customize it.

Constructing custom XML from relational tables

If you need to produce customized XML output with your own choice of tag names and nesting, you would normally use the XML construction functions that are part of the standard SQL language. These functions are available in DB2, but not in mySQL.

In mySQL you can try text concatenation instead, like this:

SELECT CONCAT('\n<myorder id="15">\n',
              GROUP_CONCAT('  <item>', iname, '</item>\n' SEPARATOR ''),
               '</myorder>') AS xmldoc
FROM order
WHERE oid = 15

–Result:

<myorder id="15">
  <item>Laptop</item>
  <item>Bag</item>
</myorder>

The function GROUP_CONCAT aggregates text from multiple rows, as a workaround for the lack of a proper XMLAGG function, which is defined in the SQL standard. The problem with text concatenation is that mySQL does not check whether the constructed XML is well-formed at all.

As an alternative you can use the 3rd-party library lib_mysqludf_xql. This library offers XML construction functions such as xql_element, xql_forest, xql_agg. These functions intend to mimick XML functions in the SQL language standard, such as XMLELEMENT, XMLFOREST, or XMLAGG. However, a closer look at the xql library reveals that the syntax and arguments of the xql functions differ so much from the SQL standard, that I would consider them proprietary and not open.

Neither the text concatenation functions nor the xql library enable you to explicitly construct namespace declarations and guarantee the correct usage of namespaces within constructed XML documents. In DB2, the SQL standard function XMLNAMESPACES is available to ensure correct construction of XML data with namespaces.

Conclusion

The mySQL community has begun only recently to add XML features to mySQL. Therefore the current XML support in mySQL is, in my personal opinion, still quite limited. Although this might change in the future, you may prefer to use DB2 for your XML applications. DB2 Express-C is freely available, just like mySQL. The rich XML features in DB2 enable you to avoid costly XML manipulation in application code, which leads to better performance, less code, and easier application maintenance over time.

XML Support in mySQL

December 23, 2009

Considering that mySQL is a quite popular database for web applications, you may be surprised that its XML support is quite basic, compared to other databases. If you need a free database that handles XML well, DB2 Express-C is the better choice, in my personal opinion. Before going into feature details, let’s start with a high level look at what is and isn’t available in mySQL version 5.1, 5.5 and 6.0.

The XML support in mySQL is subject to the following restrictions:

  • No XML data type, and hence no XML columns.
  • No support for the SQL/XML or XQuery standards.
  • Only a subset of XPath is supported.
  • No XQuery Update expressions.
  • No XML indexes.
  • No support for XML Schemas or document validation.
  • No support for namespaces, such as default namespaces or namespace declarations in queries.
  • Cannot extract XML fragments or XML attribute values from a document – can only extract atomic XML element values (text nodes).
  • No built-in function for XSLT.

These features are missing in mySQL but they are available in DB2 Express-C, for free.

The XML support in mySQL includes:

  • Storage of XML as text in LOB columns.
    • Basic support for XPath to extract atomic values from XML documents in text form (ExtractValue() ).
    • An UpdateXML() function to replace XML elements within an XML document in text form.
  • Shredding of XML into a single table, but only if (a) the XML is flat and regular so that it trivially maps to a relational table, and (b) the XML tag names match the column names in the target table. Doesn’t strike me as very flexible.
  • Converting XML to relational format
    • Exporting relational row sets to a fixed predefined XML format.
    • Constructing custom XML from relational tables through the use of string concatenation functions or external libraries.

Let’s look at the LOB storage and the ExtractValue() and UpdateXML() functions in mySQL now. (I will discuss the shredding and XML construction features in mySQL in a subsequent blog post.)

Storage of XML as text in LOB columns

In mySQL you can use the LOAD_FILE() function to insert individual XML files into LOB columns. This approach treats the XML data as if it was arbitrary text. The XML structure is ignored and not checked for well-formedness. In DB2 you would use an XML column instead of a LOB column. DB2’s XML columns store XML natively (not as text), ensure well-formedness, and enable high XML query and update performance.

Basic XPath and the ExtractValue() function

The proprietary ExtractValue() function in mySQL takes two string arguments. The first is an XML document in text format, the second is the XPath expression that you want to evaluate. The function returns the first text node under each element that matches the XPath expression. For example:

ExtractValue('<a><b><c>5</c></b></a>' , '/a/b/c')

returns the string “5″. If multiple matches are found then the values are returned in a single, space-delimited string:

ExtractValue('<a><b>1</b><b>2</b><b>3</b></a>' , '//b')

returns the string “1 2 3″. The space-delimited output might lead to problems. For example, both of the following function calls return the same result string “Laptop Bag”:

ExtractValue('<order><item>Laptop</item><item>Bag</item></order>',
             '/order/item' )
ExtractValue('<order><item>Laptop Bag</item></order>',
             '/order/item' )

But, the result “Laptop Bag” doesn’t tell you whether an order contains two items (a laptop and a bag for it) or just a single item (a laptop bag). In DB2 you can use the SQL/XML function XMLTABLE to avoid such problems.

Also note that ExtractValue() only extracts atomic values. It doesn’t let you extract XML fragments or attributes. Take the following as an example:

ExtractValue('<a><b><c>5</c></b></a>' , '/a/b')

This expression returns an empty string, which may seem incorrect because the XPath data model defines the value of the element <b> in this example to be ‘5′. The empty result makes more sense when you know that mySQL always appends /text() to your XPath! mySQL evaluates /a/b/text() instead of /a/b. This also implies that you cannot extract and return attribute values!
In DB2 you can certainly choose and execute any of /a/b, /a/b/text(), and /a/@b, to retrieve elements, document fragments, text nodes, or attribute values from your XML documents, in compliance with the XPath and SQL/XML standards.

The UpdateXML() function

The proprietary UpdateXML() function in mySQL takes three string arguments: 

  1. an XML document
  2. an XPath expression
  3. a new value or XML fragment.

The returned value is the updated XML document string in which the element identified by the XPath has been replaced by the value or XML fragment in the 3rd argument.

As far as I can tell, the UpdateXML() function has several limitations:

  • You cannot modify multiple occurrences of the same element within a document.
  • The function does not check whether the replacement fragment or the final updated document are well-formed. Hence, you might inadvertently turn an existing well-formed document into a non-well-formed document! This can be disastrous and render the document useless, because it can then no longer be parsed and manipulated with XML functions.
  • The function does not observe namespaces and may produce incorrect namespace bindings in the document.

The XML update capabilities in DB2 follow the rich XQuery Update standard and are not subject to any of these restrictions. In particular, DB2 does not let you destroy your well-formed XML documents or the consistency of their namespaces. This is very important, I think.

In my next post I will discuss the shredding and XML construction features in mySQL, as well as a conclusion. If you find that I have misrepresented anything, please do let me know and I will correct it.

Merry Christmas!

Resources

http://www.databasejournal.com/features/mysql/article.php/3846526/Working-with-XML-Data-in-MySQL.htm
http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
http://dev.mysql.com/doc/refman/5.5/en/xml-functions.html

No, this post is not really about the movie “The Full Monty”. But, if you like British humor you might like to watch that movie someday.

I’m following some of the ongoing debate about Oracle’s acquisition of Sun and the dispute over mySQL. The European Commission has not dropped its concerns about Oracle owning mySQL. What will Oracle do if the deal goes through? One can only wonder. You can take a look at Conor’s poll results to get a (non-scientific) read of what people think. Interestingly, the word is that Oracle has asked many of their big customers to write to the European Commission to push for unconditional acceptance of the Sun acquisition, including mySQL. This in turn has motivated Monty Widenius, the founding father of mySQL, to reach out to the mySQL community. In his blog he encourages mySQL users to also write to the European Commission. However, mySQL users ask that either mySQL be split off into a separate company or that Oracle be required to give legally binding guarantees for the “freedom” of mySQL. And so the saga continues

Coming back to XML, I think it’s very interesting to look again at the XML features in mySQL. You may find that they are quite rudimentary compared to DB2 pureXML. I’ll provide more details on that in the next post.

The statistics for this blog show that there is a quite significant number of readers (you!) and they might want to know more about the new author of this blog. For starters you’re welcome to check out my homepage. I’m in the DB2 development team that’s located at IBM’s Silicon Valley Lab in San Jose. I have been working on DB2 pureXML since the very early stages of this new and exciting feature set, and I have a passion for XML and database technologies. After DB2 pureXML has been released in 2006, I have been working with a lot of our clients, helping them design, implement, deploy, and optimize their XML-based applications. As I continue to work in the XML space, this blog is a great place to share some of the observations and experiences.

Inevitably, the style of this blog will change a little. On occasions I will dive a little deeper into technical questions and maybe discuss some coding examples with XPath, XQuery, SQL/XML, Java or other languages. But, I will also share my general views on what’s happening in the world of XML and databases. And for those of you who also want to follow Conor’s view of the world, he is literally only one click away – at the db2news blog !

- Matthias

Welcome Matthias Nicola!

December 13, 2009

I’ve got some great news for readers of this blog. Matthias Nicola is going to take over ownership of this blog. Many of you will know Matthias as one of the leading DB2 pureXML experts at IBM. I have featured Matthias several times on this blog, including a blog with a short video of him presenting at the IDUG Conference this year and several posts about his recently published DB2 pureXML Cookbook.

Recently, Matthias told me of his desire to start blogging. I am delighted that I could persuade him to bring his knowledge and talents to this blog. Thanks Matthias… and welcome!

JavaScript Object Notation (JSON) is a popular way for servers and clients to exchange information. It uses serialized text to represent objects (or data structures). If you are interested in using JSON when you work with DB2 pureXML, there is a new series of articles you should check out.

These articles teach you how DB2 pureXML can store, manage, and query JSON. The articles include downloadable sample code and step-by-step instructions. They also describe the benefits of using pureXML to store JSON.

Part 3 will focus on the creation of the presentation layer with Open-Social Gadgets that rely on the JSONx Universal Services as a back-end.

The adoption of XML-based standards in many industries has had a significant impact on native XML database adoption. After all, the processing efficiencies offered by a native XML database like DB2 pureXML are vital as systems scale up and out. Readers of the blog will also know that IBM has made it easy for DB2 users to work with many XML-based industry standards by freely distributing working sample implementations of these standards.

Of course, IBM supports such standards by making its industry experts available for participation in many of the standards committees. This week IBM further demonstrated its support for the Association for Cooperative Operations Research and Development (ACORD) and its insurance and financial services standards by donating technology assets including its Insurance Application Architecture (IAA) Business Object/Data Model and the IAA Product Specification Diagram (PSD) to ACORD.

This represents a significant contribution of intellectual property associated with IBM’s insurance industry models. The donated models are from IBM’s Insurance Application Architecture (IAA), the company’s insurance business and IT architecture framework. IAA is continually developed by IBM in collaboration with 100 leading insurance companies around the world, and the IAA model has been licensed by more than 200 insurers worldwide. IAA Business Object/Data Model, coupled with the existing ACORD Information Model, will form the building blocks for the next version of the ACORD Information Model.

This announcement builds on efforts to encourage collaboration and drive innovation in the insurance industry. The adoption of standardized business processes and models will make it easier for insurers to work with agents, brokers and other data partners.

You can read more about this announcement at IBM Contributes Technology Assets to Help Drive Standards in Insurance Industry. You can read more about implementing ACORD standards in DB2 pureXML at Storing and Retrieving ACORD Data for Insurance

Bryan Patterson has written an excellent step-by-step article about creating an electronic forms solution on Adobe Developer Connection. The article guides you through creating a simple yet powerful eForms solution based on Adobe LiveCycle Designer ES and IBM DB2 pureXML. The article includes links to download all software needed to automate:

  • The collection of user data using electronic forms
  • The transmission of user data using XML, and
  • The storage of user data using a database.

Because all parts of the solution use XML, there is no need for complex data mapping or conversion steps between components. The XML data format used in this example implementation is a very simple structure but you can easily expand the format to meet specific needs or even base the format on one of the many XML-based industry standards for data exchange such as NIEM for government, ACORD for insurance, or FIXML for financial markets. You can read more at Creating an XML electronic forms solution with an Adobe PDF form and IBM DB2 pureXML.

The IBM Information on Demand conference is upon us again. It is being held next week in Las Vegas. Once again, if you are interested in native XML databases, there is an exciting line-up of activities and sessions. Here are some of the highlights:

  • Ask The Experts
    • You can freely schedule 1×1 sessions with DB2 pureXML experts from IBM.
  • Business Track
    • How Verizon Business Streamlined their Order Management System, featuring Andrew Washburn (Verizon)
    • How BJC HealthCare? is Using IBM DB2 pureXML to Improve Medical Research, featuring Tom Holdener (BJC Healthcare)
  • Technical Track
    • Developing DB2 pureXML Applications with COBOL, Java, and .NET: Techniques and a Use Case, featuring Kal Mirza (Barclays Wealth)
    • Implementing an Enterprise Order Database with DB2 pureXML at Verizon, featuring Andrew Washburn (Verizon)
    • Taking XML to the Data Warehouse with Intel and DB2, featuring Agustin Gonzalez (Intel)
    • Querying Large Medical Data Sets using IBM DB2 pureXML, featuring Tom Holdener (BJC Healthcare)
    • XML and DB2 pureXML for Beginners
    • SOA and IBM DB2 pureXML: The Role of DB2 in an Innovative Architecture
    • IBM DB2 pureXML in DB2 for z/OS: Exciting Enhancements and New Features
    • Customer Experiences and Case Studies of IBM DB2 pureXML in DB2 9 for z/OS
  • Hands on Labs
    • Introduction to IBM DB2 pureXML
    • New and Advanced Features of IBM DB2 pureXML

I am enjoying several “email conversations” after last week’s Electronic Health Records for Smarter Healthcare Webinar. In one of those threads, Paul Grundy is sharing his experiences with the patient-centered medical home movement. (You can find Paul blogging at Healthnex.) The patient-centered medical home involves leveraging technology to improve the efficiency and quality of patient care. It is a great example of transforming healthcare from the ground-up, and something I would love to see in my community because I genuinely believe that it would improve the quality of healthcare my family receives. Of course, XML standard-based formats and technologies make this movement possible. You can see great videos about patient-centered medical home at:

The organization promoting this concept are the Patient-Centered Primary Care Collaborative. They are a coalition of more than 600 major employers, consumer groups, organizations representing primary care physicians, and others who have joined to advance the concept of a patient-centered medical home. I genuinely hope that their efforts bear fruit.

Electronic Health Records are a hot topic at the moment. The US federal government has set aside $19 billion in an economic stimulus package to create an electronic health record for every American by 2014. The government is not only using incentives to encourage adoption; they are also using penalties. Between using the carrot and the stick, the US federal government is determined to bring this wave of technology into mass adoption in the healthcare industry.

Next week, I will join Robert Abate to deliver an ‘Espresso Webcast’ about the advantages of implementing standards-based infrastructure for Electronic Health Records (EHRs) and Electronic Medical Records (EMRs). We will also discuss the considerations you need to be aware of as you work with the infrastructure for electronic health systems. Espresso Webcasts are slightly shorter than typical Webcasts, lasting about 35 minutes or so.

The Webcast will be on Tuesday 01 September at 12pm ET. You can register at Electronic Health Records for Smarter Healthcare.

I have previously covered how DB2 9.7 supports native XML data with hash partitioning (database partitioning), range partitioning (table partitioning), and multi-dimensional clustering. These new features make it feasible to analyze information in native XML format, side-by-side with relational data, in a data warehouse. And, of course, being able to work with native XML data in such scenarios offers many efficiencies and advantages.

In reality, many data warehouse projects involve pulling different types of information from disparate data sources around an organization. My colleagues have published the first in a series of two articles that provide step-by-step instructions for integrating information from such disparate sources into a data warehouse. The first of those articles is now available at IBM InfoSphere DataStage and DB2 pureXML, Part 1: Integrate XML operational data into a data warehouse.

This article tells you how to use IBM® InfoSphere™ DataStage to extract and transform XML data managed by DB2® pureXML®. It also explores how DataStage can load this data into a table with traditional SQL data types, and a table with both relational and XML columns. The article includes sample scripts and data that you can download.

The second part of this article series will explore another important scenario: using DataStage to read information from a flat file, convert the data into an XML format, and load this XML data into a data warehouse that contains a table with a DB2 pureXML column.

In the past, I discussed the DB2 pureXML Cookbook. This book is very valuable for all DB2 pureXML users, from novice through expert.

If you are interested in buying this book, please be aware that International DB2 User Group (IDUG) members get a 45% discount on IBM Press books. IDUG membership is free. For information about how to get the discount, visit the following Web page: http://www.idug.org/public-spotlights/45-book-discount.html

Also, as a special promotion, IDUG have a competition where they are giving away 3 copies of this book as prizes. For information about entering to win a free copy of the book, visit the following Web page: http://www.idug.org/public-spotlights/free-db2-book.html

If you are implementing a SOA environment, Solitaire has a very interesting finding for you. Solitaire authored a whitepaper where they analyze database operations at more than 4,100 production systems. As part of their analysis of database operations on IBM System p, they looked at the correlation between the success rate of SOA projects and the choice of database software.

To classify a SOA project as successful, they asked the organization if they now enjoy a 25% or more increase in resource utilization and a 30% or more increase in the speed of provisioning. Here is a chart that shows the relative success rates for SOA projects that involve IBM DB2 and Oracle Database. Solitaire do not say why DB2 does so much better. Perhaps DB2’s superior native XML storage is a factor?

Database Choice for Successful SOA Projects

You can read the full Solitaire Report at Whitepaper: DB2 Performance on IBM System p® and System x®.

Almost 24 million people in the US are diagnosed with diabetes. If you know someone with diabetes, you know about the hassles that constant monitoring imposes. MyCareTeam and IBM have collaborated to improve continuous monitoring in such situations, with a solution that both reduces costs and improves the quality of healthcare. I am particularly interested in this collaboration because it involves the use of XML data. IBM and MyCareTeam have written a great paper that covers a number of topics that will be of interest to those in the diabetes and healthcare technologies fields. For instance, there is information about the use of technologies like XML storage and Web services in the context of continuing care. There is also information about related initiatives such as the Continua Health Alliance’s role in selecting appropriate standards. You can read more at Healthcare in the Home: Continuing Care for Diabetes with Collaborative Technologies.