How to list the paths of all elements in an XML document?

February 4, 2012

A common question is how to obtain a list of all the elements and attributes that occur in an XML document. Producing such a list is what I call “XML profiling” and in a previous blog post I have discussed several SQL/XML queries that can do this.

An extension of this question is how to get the paths of all the elements and attributes in a document. This seemingly simple task is -unfortunately- not nearly as simple as one would think! XPath and XQuery do not have a function that takes a given element or attribute as input and returns the full path to that node.

The solution is to write a query that traverses the XML document level by level to collect the element names at every level and concatenate them appropriately to construct the paths for every elements and attributes at every level.

There are many ways in which this can be done. You can use XQuery or SQL/XML and you can choose whether to use recursion or not. Let’s look at a few examples.

First, let’s create a simple table with a small document that we can use in the examples:

create table mytable(xmldoc XML);

insert into mytable values(
'<Message>
   <Type>Urgent</Type>
   <Person id ="123">
     <FirstName>Robert</FirstName>
     <LastName>Tester</LastName>
   </Person>
 </Message>');

A first and straightforward solution is to start at the root of the document, then at the first level of child nodes, and then at the children of each these child nodes, and so on. For each element or attribute we construct the path by concatenating the path from the parent with the name of the element or attribute. We do this for all nodes at a given level in the tree and then move to the next level of the document.:

xquery
for $L1 in db2-fn:xmlcolumn("MYTABLE.XMLDOC")/*
let $L1path := fn:string-join( ($L1/local-name() ),"/" )
return (
  $L1path,
  for $L2 in $L1/(*,@*)
  let $L2path := fn:string-join( ($L1path, $L2/local-name() ),"/" )
  return (
    $L2path,
    for $L3 in $L2/(*,@*)
    let $L3path := fn:string-join( ($L2path, $L3/local-name() ),"/" )
    return (

    $L3path,
    for $L4 in $L3/(*,@*)
    let $L4path := fn:string-join( ($L3path, $L4/local-name() ),"/" )
    return (

    $L4path,
    for $L5 in $L4/(*,@*)
    let $L5path := fn:string-join( ($L4path, $L5/local-name() ),"/" )
    return ($L5path)))));

Message
Message/Type
Message/Person
Message/Person/id
Message/Person/FirstName
Message/Person/LastName

6 record(s) selected.

The obvious shortcoming of this query is that it assumes a maximum of 5 levels in the document. If your documents are deeper than this, you can easily extend the query so that it goes down to 10 or 20 levels, whatever you need. That's maybe not very elegant, but it works if you can define an upper bound on the depths of your XML documents, which is usually possible.

You probably notice that the path Message/Person/id should actually be Message/Person/@id because "id" is an XML attribute. The query can enhanced to take care of such details. In the last two sample queries of my XML profiling post you have seen how to use the self::attribute() test for this purpose.

If you prefer a more elegant solution that does not require any assumption about the maximum depths of the XML documents, then you need to code a recursive query, either in XQuery or in SQL/XML. Let's try SQL/XML for a change.

You may already be familiar with how recursive SQL works. If not, you can look at several existing examples. The basic idea is to use a WITH clause, also called "common table expression", that contains a UNION ALL between the start of the processing and a recursive reference back to the common table expression itself. The following  augments this approach with the XMLTABLE function that extracts nodes and node names from the XML:

WITH pathstable (name, node, xpath) AS (
  SELECT x.name AS name, x.node AS xmlnode,'/' || x.name AS xpath
  FROM mytable,
       XMLTABLE('$XMLDOC/*'
        COLUMNS
          name    varchar(30) PATH './local-name()',
          node    XML         PATH '.') AS x
  UNION ALL
  SELECT y.name AS name, y.node AS xmlnode, xpath|| '/' || y.name AS xpath
  FROM pathstable,
       XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
        COLUMNS
         name    varchar(30) PATH 'local-name()',
         node    XML         PATH '.') AS y
) SELECT name, xpath
  FROM pathstable;

NAME                           XPATH
------------------------------ -------------------------------
Message                        /Message
Type                           /Message/Type
Person                         /Message/Person
id                             /Message/Person/id
FirstName                      /Message/Person/FirstName
LastName                       /Message/Person/LastName

6 record(s) selected

If you want to list the element and attribute values for each path, then you can easily modify this query as follows:

WITH pathstable (name, node, xpath, value) AS (
  SELECT x.name AS name, x.node AS xmlnode,
         '/' || x.name AS xpath, x.value as value
  FROM mytable,
       XMLTABLE('$XMLDOC/*'
        COLUMNS
          name    varchar(30) PATH './local-name()',
          value   varchar(20) PATH 'xs:string(.)',
          node    XML         PATH '.') AS x
  UNION ALL
  SELECT y.name AS name, y.node AS xmlnode,
         xpath|| '/' || y.name AS xpath, y.value as value
  FROM pathstable,
       XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
        COLUMNS
          name    varchar(30) PATH 'local-name()',
          value   varchar(20) PATH 'xs:string(.)',
          node    XML         PATH '.') AS y
) SELECT xpath, value
  FROM pathstable;

XPATH                           VALUE
------------------------------- --------------------
/Message                        UrgentRobertTester
/Message/Type                   Urgent
/Message/Person                 RobertTester
/Message/Person/id              123
/Message/Person/FirstName       Robert
/Message/Person/LastName        Tester

6 record(s) selected

A few things to note:

  • The value of an XML element is defined at the concatenation of all text nodes in the subtree under that element. This explains the values that you see for /Message and /Message/Person in the example above.
  • For longer paths you may need to increase the length of the VARCHAR(n) in the XMLTABLE function.
  • In DB2 you may receive warning SQL0347W, which says that this query might recursively run into an infinite loop. But, this would only happen if your XML document was infinitely deep, which isn't possible. So, you can safely ignore that warning.


About these ads

6 Responses to “How to list the paths of all elements in an XML document?”

  1. Stacey Cooper Says:

    Best solution and explanation I have found anywhere! Perfect!


  2. Thanks! I’m glad this is useful.

  3. Marcus Says:

    Excelent. Thank you very much. One question: It is possible do this with XQuery? I have tried but with no success!


  4. Hi… the first query in this post is XQuery, but without recursion.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 51 other followers

%d bloggers like this: