XML Profiling – How to get a list of all elements and attributes…

May 26, 2010

Recently I got asked several times how to get a list of all elements and attributes from an XML document, or from all documents in an XML column. Here are some ideas of how you can get this information with SQL/XML queries.

As a simple example, let’s assume we have the following XML document stored in the XML column PORDER of the table “purchaseorder”:

<PurchaseOrder PoNum="5000" OrderDate="2006-02-18" Status="Unshipped">
 <item>
    <partid>100-100-01</partid>
    <name>Snow Shovel, Basic 22 inch</name>
    <quantity>3</quantity>
    <price>9.99</price>
 </item>
 <item>
    <partid>100-103-01</partid>
    <name>Snow Shovel, Super Deluxe 26 inch</name>
    <quantity>5</quantity>
    <price>49.99</price>
 </item>
</PurchaseOrder>

You can run the following query to get a list of all elements in the (depth-first) order in which they appear in the document:

SELECT T.*
FROM purchaseorder,
     XMLTABLE('$PORDER//*'
      COLUMNS
        seq       FOR ORDINALITY,
        element   VARCHAR(20) PATH 'name(.)',
        parent    VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;

SEQ    ELEMENT              PARENT
------ -------------------- --------------------
     1 PurchaseOrder
     2 item                 PurchaseOrder
     3 partid               item
     4 name                 item
     5 quantity             item
     6 price                item
     7 item                 PurchaseOrder
     8 partid               item
     9 name                 item
    10 quantity             item
    11 price                item

 11 record(s) selected.

How does that work? The XMLTABLE function produces one row for each XML node that is produced by the so-called row-generating expression, which in this case is //*. The variable $PORDER simply refers to the XML column in the table. The // navigates exhaustively through every branch and every level of the document, and the * matches any element that is encountered. Thus, one row is produced for each element.

Then the COLUMNS clause defines the content of each row in the result set. The column “seq” is defined with the keywords FOR ORDINALITY to simply number the rows sequentially. The column “name” uses the the XPath function name() to retrieve the name of each element. In the third column, “parent”, the XPath ../name() retrieves the name of the parent of each element, which provides us with a sense of the document structure.

If your element names are longer than 20 characters then you may need to increase the length of the VARCHAR(20) columns in the query above.

You probably noted that the query result above did not list any attributes that occur in the original XML document. To get all attributes, you can use //@* instead of //* :

SELECT T.*
FROM purchaseorder,
     XMLTABLE('$PORDER//@*'
      COLUMNS
       seq         FOR ORDINALITY,
       attribute   VARCHAR(20) PATH 'name(.)',
       parent      VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;

SEQ   ATTRIBUTE            PARENT
----- -------------------- --------------------
    1 PoNum                PurchaseOrder
    2 OrderDate            PurchaseOrder
    3 Status               PurchaseOrder

 3 record(s) selected.

But, most likely you would like to get a combined list of elements and attributes. To achieve this, combine the expressions //* and //@* as shown in the next query. The expression //(*, @*) means that you want to traverse the entire document and match any element or attribute that is encountered:

SELECT T.*
FROM purchaseorder,
     XMLTABLE('$PORDER//(*, @*)'
      COLUMNS
       seq    FOR ORDINALITY,
       node   VARCHAR(20) PATH 'name(.)',
       parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;

SEQ   NODE                 PARENT
----- -------------------- --------------------
    1 PurchaseOrder
    2 PoNum                PurchaseOrder
    3 OrderDate            PurchaseOrder
    4 Status               PurchaseOrder
    5 item                 PurchaseOrder
    6 partid               item
    7 name                 item
    8 quantity             item
    9 price                item
   10 item                 PurchaseOrder
   11 partid               item
   12 name                 item
   13 quantity             item
   14 price                item

 14 record(s) selected.

At this point you are probably wondering how to distinguish elements from attributes in the list above. Ok, we can extend the previous query to provide this extra information. The next query has the additional column “type” whose value is defined by an XQuery if-then-else expression that checks whether the current node is an attribute. If not, then we simply assume it’s an element:

SELECT T.*
FROM purchaseorder,
     XMLTABLE('$PORDER//(*, @*)'
      COLUMNS
       seq    FOR ORDINALITY,
       node   VARCHAR(20) PATH 'name(.)',
       type   VARCHAR(15) PATH 'if (self::attribute())
                                then "Attribute"
                                else "Element"',
       parent VARCHAR(20) PATH '../name(.)' ) AS T
WHERE poid = 5000;

SEQ   NODE                 TYPE            PARENT
----- -------------------- --------------- --------------------
    1 PurchaseOrder        Element
    2 PoNum                Attribute       PurchaseOrder
    3 OrderDate            Attribute       PurchaseOrder
    4 Status               Attribute       PurchaseOrder
    5 item                 Element         PurchaseOrder
    6 partid               Element         item
    7 name                 Element         item
    8 quantity             Element         item
    9 price                Element         item
   10 item                 Element         PurchaseOrder
   11 partid               Element         item
   12 name                 Element         item
   13 quantity             Element         item
   14 price                Element         item

 14 record(s) selected.

And finally, wouldn’t it be nice to also display the value of each element and attribute in this list? The next query includes the column “value” for this purpose. The condition “if (not(./*))” checks whether a node in the document has no children. If this is true, then it’s typically an attribute or a leaf element for which it makes sense to report a value:

SELECT T.*
FROM purchaseorder,
     XMLTABLE('$PORDER//(*, @*)'
      COLUMNS
        seq    FOR ORDINALITY,
        node   VARCHAR(20) PATH 'name(.)',
        value  VARCHAR(20) PATH 'if (not(./*))
                                 then substring(.,1,20) else ()',
        type   VARCHAR(15) PATH 'if (self::attribute())
                                 then "Attribute"
                                 else "Element"',
        parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;

SEQ NODE           VALUE               TYPE       PARENT
--- -------------- ------------------- ---------- -------------
  1 PurchaseOrder  -                   Element
  2 PoNum          5000                Attribute  PurchaseOrder
  3 OrderDate      2006-02-18          Attribute  PurchaseOrder
  4 Status         Unshipped           Attribute  PurchaseOrder
  5 item           -                   Element    PurchaseOrder
  6 partid         100-100-01          Element    item
  7 name           Snow Shovel, Basic  Element    item
  8 quantity       3                   Element    item
  9 price          9.99                Element    item
 10 item           -                   Element    PurchaseOrder
 11 partid         100-103-01          Element    item
 12 name           Snow Shovel, Super  Element    item
 13 quantity       5                   Element    item
 14 price          49.99               Element    item

 14 record(s) selected.

Ok, you probably get the picture. You can further refine and adjust these queries to your needs. Let me know if you have any questions on any of this…

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 51 other followers

%d bloggers like this: