How to list the paths of all elements in an XML document?
February 4, 2012
A common question is how to obtain a list of all the elements and attributes that occur in an XML document. Producing such a list is what I call “XML profiling” and in a previous blog post I have discussed several SQL/XML queries that can do this.
An extension of this question is how to get the paths of all the elements and attributes in a document. This seemingly simple task is -unfortunately- not nearly as simple as one would think! XPath and XQuery do not have a function that takes a given element or attribute as input and returns the full path to that node.
The solution is to write a query that traverses the XML document level by level to collect the element names at every level and concatenate them appropriately to construct the paths for every elements and attributes at every level.
There are many ways in which this can be done. You can use XQuery or SQL/XML and you can choose whether to use recursion or not. Let’s look at a few examples.
First, let’s create a simple table with a small document that we can use in the examples:
create table mytable(xmldoc XML); insert into mytable values( '<Message> <Type>Urgent</Type> <Person id ="123"> <FirstName>Robert</FirstName> <LastName>Tester</LastName> </Person> </Message>');
A first and straightforward solution is to start at the root of the document, then at the first level of child nodes, and then at the children of each these child nodes, and so on. For each element or attribute we construct the path by concatenating the path from the parent with the name of the element or attribute. We do this for all nodes at a given level in the tree and then move to the next level of the document.:
xquery for $L1 in db2-fn:xmlcolumn("MYTABLE.XMLDOC")/* let $L1path := fn:string-join( ($L1/local-name() ),"/" ) return ( $L1path, for $L2 in $L1/(*,@*) let $L2path := fn:string-join( ($L1path, $L2/local-name() ),"/" ) return ( $L2path, for $L3 in $L2/(*,@*) let $L3path := fn:string-join( ($L2path, $L3/local-name() ),"/" ) return ( $L3path, for $L4 in $L3/(*,@*) let $L4path := fn:string-join( ($L3path, $L4/local-name() ),"/" ) return ( $L4path, for $L5 in $L4/(*,@*) let $L5path := fn:string-join( ($L4path, $L5/local-name() ),"/" ) return ($L5path))))); Message Message/Type Message/Person Message/Person/id Message/Person/FirstName Message/Person/LastName 6 record(s) selected.
The obvious shortcoming of this query is that it assumes a maximum of 5 levels in the document. If your documents are deeper than this, you can easily extend the query so that it goes down to 10 or 20 levels, whatever you need. That's maybe not very elegant, but it works if you can define an upper bound on the depths of your XML documents, which is usually possible.
You probably notice that the path Message/Person/id should actually be Message/Person/@id because "id" is an XML attribute. The query can enhanced to take care of such details. In the last two sample queries of my XML profiling post you have seen how to use the
self::attribute() test for this purpose.
If you prefer a more elegant solution that does not require any assumption about the maximum depths of the XML documents, then you need to code a recursive query, either in XQuery or in SQL/XML. Let's try SQL/XML for a change.
You may already be familiar with how recursive SQL works. If not, you can look at several existing examples. The basic idea is to use a WITH clause, also called "common table expression", that contains a UNION ALL between the start of the processing and a recursive reference back to the common table expression itself. The following augments this approach with the XMLTABLE function that extracts nodes and node names from the XML:
WITH pathstable (name, node, xpath) AS ( SELECT x.name AS name, x.node AS xmlnode,'/' || x.name AS xpath FROM mytable, XMLTABLE('$XMLDOC/*' COLUMNS name varchar(30) PATH './local-name()', node XML PATH '.') AS x UNION ALL SELECT y.name AS name, y.node AS xmlnode, xpath|| '/' || y.name AS xpath FROM pathstable, XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE" COLUMNS name varchar(30) PATH 'local-name()', node XML PATH '.') AS y ) SELECT name, xpath FROM pathstable; NAME XPATH ------------------------------ ------------------------------- Message /Message Type /Message/Type Person /Message/Person id /Message/Person/id FirstName /Message/Person/FirstName LastName /Message/Person/LastName 6 record(s) selected
If you want to list the element and attribute values for each path, then you can easily modify this query as follows:
WITH pathstable (name, node, xpath, value) AS ( SELECT x.name AS name, x.node AS xmlnode, '/' || x.name AS xpath, x.value as value FROM mytable, XMLTABLE('$XMLDOC/*' COLUMNS name varchar(30) PATH './local-name()', value varchar(20) PATH 'xs:string(.)', node XML PATH '.') AS x UNION ALL SELECT y.name AS name, y.node AS xmlnode, xpath|| '/' || y.name AS xpath, y.value as value FROM pathstable, XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE" COLUMNS name varchar(30) PATH 'local-name()', value varchar(20) PATH 'xs:string(.)', node XML PATH '.') AS y ) SELECT xpath, value FROM pathstable; XPATH VALUE ------------------------------- -------------------- /Message UrgentRobertTester /Message/Type Urgent /Message/Person RobertTester /Message/Person/id 123 /Message/Person/FirstName Robert /Message/Person/LastName Tester 6 record(s) selected
A few things to note:
- The value of an XML element is defined at the concatenation of all text nodes in the subtree under that element. This explains the values that you see for /Message and /Message/Person in the example above.
- For longer paths you may need to increase the length of the VARCHAR(n) in the XMLTABLE function.
- In DB2 you may receive warning SQL0347W, which says that this query might recursively run into an infinite loop. But, this would only happen if your XML document was infinitely deep, which isn't possible. So, you can safely ignore that warning.