XML Support in mySQL
December 23, 2009
Considering that mySQL is a quite popular database for web applications, you may be surprised that its XML support is quite basic, compared to other databases. If you need a free database that handles XML well, DB2 Express-C is the better choice, in my personal opinion. Before going into feature details, let’s start with a high level look at what is and isn’t available in mySQL version 5.1, 5.5 and 6.0.
The XML support in mySQL is subject to the following restrictions:
- No XML data type, and hence no XML columns.
- No support for the SQL/XML or XQuery standards.
- Only a subset of XPath is supported.
- No XQuery Update expressions.
- No XML indexes.
- No support for XML Schemas or document validation.
- No support for namespaces, such as default namespaces or namespace declarations in queries.
- Cannot extract XML fragments or XML attribute values from a document – can only extract atomic XML element values (text nodes).
- No built-in function for XSLT.
These features are missing in mySQL but they are available in DB2 Express-C, for free.
The XML support in mySQL includes:
- Storage of XML as text in LOB columns.
- Basic support for XPath to extract atomic values from XML documents in text form (ExtractValue() ).
- An UpdateXML() function to replace XML elements within an XML document in text form.
- Shredding of XML into a single table, but only if (a) the XML is flat and regular so that it trivially maps to a relational table, and (b) the XML tag names match the column names in the target table. Doesn’t strike me as very flexible.
- Converting XML to relational format
- Exporting relational row sets to a fixed predefined XML format.
- Constructing custom XML from relational tables through the use of string concatenation functions or external libraries.
Let’s look at the LOB storage and the ExtractValue() and UpdateXML() functions in mySQL now. (I will discuss the shredding and XML construction features in mySQL in a subsequent blog post.)
Storage of XML as text in LOB columns
In mySQL you can use the LOAD_FILE() function to insert individual XML files into LOB columns. This approach treats the XML data as if it was arbitrary text. The XML structure is ignored and not checked for well-formedness. In DB2 you would use an XML column instead of a LOB column. DB2′s XML columns store XML natively (not as text), ensure well-formedness, and enable high XML query and update performance.
Basic XPath and the ExtractValue() function
The proprietary ExtractValue() function in mySQL takes two string arguments. The first is an XML document in text format, the second is the XPath expression that you want to evaluate. The function returns the first text node under each element that matches the XPath expression. For example:
ExtractValue('<a><b><c>5</c></b></a>' , '/a/b/c')
returns the string “5″. If multiple matches are found then the values are returned in a single, space-delimited string:
ExtractValue('<a><b>1</b><b>2</b><b>3</b></a>' , '//b')
returns the string “1 2 3″. The space-delimited output might lead to problems. For example, both of the following function calls return the same result string “Laptop Bag”:
ExtractValue('<order><item>Laptop</item><item>Bag</item></order>', '/order/item' ) ExtractValue('<order><item>Laptop Bag</item></order>', '/order/item' )
But, the result “Laptop Bag” doesn’t tell you whether an order contains two items (a laptop and a bag for it) or just a single item (a laptop bag). In DB2 you can use the SQL/XML function XMLTABLE to avoid such problems.
Also note that ExtractValue() only extracts atomic values. It doesn’t let you extract XML fragments or attributes. Take the following as an example:
ExtractValue('<a><b><c>5</c></b></a>' , '/a/b')
This expression returns an empty string, which may seem incorrect because the XPath data model defines the value of the element <b> in this example to be ’5′. The empty result makes more sense when you know that mySQL always appends /text() to your XPath! mySQL evaluates /a/b/text() instead of /a/b. This also implies that you cannot extract and return attribute values!
In DB2 you can certainly choose and execute any of /a/b, /a/b/text(), and /a/@b, to retrieve elements, document fragments, text nodes, or attribute values from your XML documents, in compliance with the XPath and SQL/XML standards.
The UpdateXML() function
The proprietary UpdateXML() function in mySQL takes three string arguments:
- an XML document
- an XPath expression
- a new value or XML fragment.
The returned value is the updated XML document string in which the element identified by the XPath has been replaced by the value or XML fragment in the 3rd argument.
As far as I can tell, the UpdateXML() function has several limitations:
- You cannot modify multiple occurrences of the same element within a document.
- The function does not check whether the replacement fragment or the final updated document are well-formed. Hence, you might inadvertently turn an existing well-formed document into a non-well-formed document! This can be disastrous and render the document useless, because it can then no longer be parsed and manipulated with XML functions.
- The function does not observe namespaces and may produce incorrect namespace bindings in the document.
The XML update capabilities in DB2 follow the rich XQuery Update standard and are not subject to any of these restrictions. In particular, DB2 does not let you destroy your well-formed XML documents or the consistency of their namespaces. This is very important, I think.
In my next post I will discuss the shredding and XML construction features in mySQL, as well as a conclusion. If you find that I have misrepresented anything, please do let me know and I will correct it.