Shredding and constructing XML in mySQL

January 4, 2010

In my previous post I have started discussing the XML features in mySQL and I have pointed out some of the key differences to DB2 pureXML. To complete this topic, let’s look at shredding and constructing XML data in mySQL.

Shredding with the LOAD_XML statement

The LOAD_XML command in mySQL enables you to convert XML to relational rows and to insert these rows into a table (shredding). The key restriction is that the input XML format has to be flat and regular so that the elements trivially map to a relational target table. The column names in the target table must match the tag names in the XML data. For example, if you want to shred name information to relational columns that are called “firstname” and “lastname”, then the XML input must have one of the following three forms:

–option 1:

<row firstname="John" lastname="Doe"/>

–option 2:

<row>
     <firstname>John</firstname>
     <lastname>Doe</lastname>
</row>

–option 3:

<row>
     <field name='firstname'>John</field>
     <field name='lastname'>Doe</field>
</row>

My experience is that in those XML applications where shredding is feasible, the tag names in the incoming XML data are typically not identical to the column names of the target tables. Hence, the LOAD_XML command seems too restrictive for many use cases. The shredding features in DB2 (described in the sample chapter of the DB2 pureXML Cookbook) don’t have any such restrictions and allow arbitrary mappings from XML to relational. Of course, in DB2 you may also prefer the use of XML columns instead of shredding.

Exporting relational data to a fixed XML format

The mysql and mysqldump clients support the command line option

--xml

With this option, any relational result set is tagged up in a default XML format. One <row> element is produced for each row of data, and one <field> element for each column in the result set. This is a fixed XML format and I didn’t find a way to customize it.

Constructing custom XML from relational tables

If you need to produce customized XML output with your own choice of tag names and nesting, you would normally use the XML construction functions that are part of the standard SQL language. These functions are available in DB2, but not in mySQL.

In mySQL you can try text concatenation instead, like this:

SELECT CONCAT('\n<myorder id="15">\n',
              GROUP_CONCAT('  <item>', iname, '</item>\n' SEPARATOR ''),
               '</myorder>') AS xmldoc
FROM order
WHERE oid = 15

–Result:

<myorder id="15">
  <item>Laptop</item>
  <item>Bag</item>
</myorder>

The function GROUP_CONCAT aggregates text from multiple rows, as a workaround for the lack of a proper XMLAGG function, which is defined in the SQL standard. The problem with text concatenation is that mySQL does not check whether the constructed XML is well-formed at all.

As an alternative you can use the 3rd-party library lib_mysqludf_xql. This library offers XML construction functions such as xql_element, xql_forest, xql_agg. These functions intend to mimick XML functions in the SQL language standard, such as XMLELEMENT, XMLFOREST, or XMLAGG. However, a closer look at the xql library reveals that the syntax and arguments of the xql functions differ so much from the SQL standard, that I would consider them proprietary and not open.

Neither the text concatenation functions nor the xql library enable you to explicitly construct namespace declarations and guarantee the correct usage of namespaces within constructed XML documents. In DB2, the SQL standard function XMLNAMESPACES is available to ensure correct construction of XML data with namespaces.

Conclusion

The mySQL community has begun only recently to add XML features to mySQL. Therefore the current XML support in mySQL is, in my personal opinion, still quite limited. Although this might change in the future, you may prefer to use DB2 for your XML applications. DB2 Express-C is freely available, just like mySQL. The rich XML features in DB2 enable you to avoid costly XML manipulation in application code, which leads to better performance, less code, and easier application maintenance over time.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 51 other followers

%d bloggers like this: