XML Support in mySQL

December 23, 2009

Considering that mySQL is a quite popular database for web applications, you may be surprised that its XML support is quite basic, compared to other databases. If you need a free database that handles XML well, DB2 Express-C is the better choice, in my personal opinion. Before going into feature details, let’s start with a high level look at what is and isn’t available in mySQL version 5.1, 5.5 and 6.0.

The XML support in mySQL is subject to the following restrictions:

  • No XML data type, and hence no XML columns.
  • No support for the SQL/XML or XQuery standards.
  • Only a subset of XPath is supported.
  • No XQuery Update expressions.
  • No XML indexes.
  • No support for XML Schemas or document validation.
  • No support for namespaces, such as default namespaces or namespace declarations in queries.
  • Cannot extract XML fragments or XML attribute values from a document – can only extract atomic XML element values (text nodes).
  • No built-in function for XSLT.

These features are missing in mySQL but they are available in DB2 Express-C, for free.

The XML support in mySQL includes:

  • Storage of XML as text in LOB columns.
    • Basic support for XPath to extract atomic values from XML documents in text form (ExtractValue() ).
    • An UpdateXML() function to replace XML elements within an XML document in text form.
  • Shredding of XML into a single table, but only if (a) the XML is flat and regular so that it trivially maps to a relational table, and (b) the XML tag names match the column names in the target table. Doesn’t strike me as very flexible.
  • Converting XML to relational format
    • Exporting relational row sets to a fixed predefined XML format.
    • Constructing custom XML from relational tables through the use of string concatenation functions or external libraries.

Let’s look at the LOB storage and the ExtractValue() and UpdateXML() functions in mySQL now. (I will discuss the shredding and XML construction features in mySQL in a subsequent blog post.)

Storage of XML as text in LOB columns

In mySQL you can use the LOAD_FILE() function to insert individual XML files into LOB columns. This approach treats the XML data as if it was arbitrary text. The XML structure is ignored and not checked for well-formedness. In DB2 you would use an XML column instead of a LOB column. DB2’s XML columns store XML natively (not as text), ensure well-formedness, and enable high XML query and update performance.

Basic XPath and the ExtractValue() function

The proprietary ExtractValue() function in mySQL takes two string arguments. The first is an XML document in text format, the second is the XPath expression that you want to evaluate. The function returns the first text node under each element that matches the XPath expression. For example:

ExtractValue('<a><b><c>5</c></b></a>' , '/a/b/c')

returns the string “5”. If multiple matches are found then the values are returned in a single, space-delimited string:

ExtractValue('<a><b>1</b><b>2</b><b>3</b></a>' , '//b')

returns the string “1 2 3”. The space-delimited output might lead to problems. For example, both of the following function calls return the same result string “Laptop Bag”:

ExtractValue('<order><item>Laptop</item><item>Bag</item></order>', 
             '/order/item' )
ExtractValue('<order><item>Laptop Bag</item></order>', 
             '/order/item' )

But, the result “Laptop Bag” doesn’t tell you whether an order contains two items (a laptop and a bag for it) or just a single item (a laptop bag). In DB2 you can use the SQL/XML function XMLTABLE to avoid such problems.

Also note that ExtractValue() only extracts atomic values. It doesn’t let you extract XML fragments or attributes. Take the following as an example:

ExtractValue('<a><b><c>5</c></b></a>' , '/a/b')

This expression returns an empty string, which may seem incorrect because the XPath data model defines the value of the element <b> in this example to be ‘5’. The empty result makes more sense when you know that mySQL always appends /text() to your XPath! mySQL evaluates /a/b/text() instead of /a/b. This also implies that you cannot extract and return attribute values!
In DB2 you can certainly choose and execute any of /a/b, /a/b/text(), and /a/@b, to retrieve elements, document fragments, text nodes, or attribute values from your XML documents, in compliance with the XPath and SQL/XML standards.

The UpdateXML() function

The proprietary UpdateXML() function in mySQL takes three string arguments: 

  1. an XML document
  2. an XPath expression
  3. a new value or XML fragment.

The returned value is the updated XML document string in which the element identified by the XPath has been replaced by the value or XML fragment in the 3rd argument.

As far as I can tell, the UpdateXML() function has several limitations:

  • You cannot modify multiple occurrences of the same element within a document.
  • The function does not check whether the replacement fragment or the final updated document are well-formed. Hence, you might inadvertently turn an existing well-formed document into a non-well-formed document! This can be disastrous and render the document useless, because it can then no longer be parsed and manipulated with XML functions.
  • The function does not observe namespaces and may produce incorrect namespace bindings in the document.

The XML update capabilities in DB2 follow the rich XQuery Update standard and are not subject to any of these restrictions. In particular, DB2 does not let you destroy your well-formed XML documents or the consistency of their namespaces. This is very important, I think.

In my next post I will discuss the shredding and XML construction features in mySQL, as well as a conclusion. If you find that I have misrepresented anything, please do let me know and I will correct it.

Merry Christmas!

Resources

http://www.databasejournal.com/features/mysql/article.php/3846526/Working-with-XML-Data-in-MySQL.htm
http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
http://dev.mysql.com/doc/refman/5.5/en/xml-functions.html

No, this post is not really about the movie “The Full Monty”. But, if you like British humor you might like to watch that movie someday.

I’m following some of the ongoing debate about Oracle’s acquisition of Sun and the dispute over mySQL. The European Commission has not dropped its concerns about Oracle owning mySQL. What will Oracle do if the deal goes through? One can only wonder. You can take a look at Conor’s poll results to get a (non-scientific) read of what people think. Interestingly, the word is that Oracle has asked many of their big customers to write to the European Commission to push for unconditional acceptance of the Sun acquisition, including mySQL. This in turn has motivated Monty Widenius, the founding father of mySQL, to reach out to the mySQL community. In his blog he encourages mySQL users to also write to the European Commission. However, mySQL users ask that either mySQL be split off into a separate company or that Oracle be required to give legally binding guarantees for the “freedom” of mySQL. And so the saga continues

Coming back to XML, I think it’s very interesting to look again at the XML features in mySQL. You may find that they are quite rudimentary compared to DB2 pureXML. I’ll provide more details on that in the next post.

The statistics for this blog show that there is a quite significant number of readers (you!) and they might want to know more about the new author of this blog. For starters you’re welcome to check out my homepage. I’m in the DB2 development team that’s located at IBM’s Silicon Valley Lab in San Jose. I have been working on DB2 pureXML since the very early stages of this new and exciting feature set, and I have a passion for XML and database technologies. After DB2 pureXML has been released in 2006, I have been working with a lot of our clients, helping them design, implement, deploy, and optimize their XML-based applications. As I continue to work in the XML space, this blog is a great place to share some of the observations and experiences.

Inevitably, the style of this blog will change a little. On occasions I will dive a little deeper into technical questions and maybe discuss some coding examples with XPath, XQuery, SQL/XML, Java or other languages. But, I will also share my general views on what’s happening in the world of XML and databases. And for those of you who also want to follow Conor’s view of the world, he is literally only one click away – at the db2news blog !

– Matthias

Welcome Matthias Nicola!

December 13, 2009

I’ve got some great news for readers of this blog. Matthias Nicola is going to take over ownership of this blog. Many of you will know Matthias as one of the leading DB2 pureXML experts at IBM. I have featured Matthias several times on this blog, including a blog with a short video of him presenting at the IDUG Conference this year and several posts about his recently published DB2 pureXML Cookbook.

Recently, Matthias told me of his desire to start blogging. I am delighted that I could persuade him to bring his knowledge and talents to this blog. Thanks Matthias… and welcome!