XML Profiling – How to get a list of all elements and attributes…
May 26, 2010
Recently I got asked several times how to get a list of all elements and attributes from an XML document, or from all documents in an XML column. Here are some ideas of how you can get this information with SQL/XML queries.
As a simple example, let’s assume we have the following XML document stored in the XML column PORDER of the table “purchaseorder”:
<PurchaseOrder PoNum="5000" OrderDate="2006-02-18" Status="Unshipped">
<item>
<partid>100-100-01</partid>
<name>Snow Shovel, Basic 22 inch</name>
<quantity>3</quantity>
<price>9.99</price>
</item>
<item>
<partid>100-103-01</partid>
<name>Snow Shovel, Super Deluxe 26 inch</name>
<quantity>5</quantity>
<price>49.99</price>
</item>
</PurchaseOrder>
You can run the following query to get a list of all elements in the (depth-first) order in which they appear in the document:
SELECT T.*
FROM purchaseorder,
XMLTABLE('$PORDER//*'
COLUMNS
seq FOR ORDINALITY,
element VARCHAR(20) PATH 'name(.)',
parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;
SEQ ELEMENT PARENT
------ -------------------- --------------------
1 PurchaseOrder
2 item PurchaseOrder
3 partid item
4 name item
5 quantity item
6 price item
7 item PurchaseOrder
8 partid item
9 name item
10 quantity item
11 price item
11 record(s) selected.
How does that work? The XMLTABLE function produces one row for each XML node that is produced by the so-called row-generating expression, which in this case is //*. The variable $PORDER simply refers to the XML column in the table. The // navigates exhaustively through every branch and every level of the document, and the * matches any element that is encountered. Thus, one row is produced for each element.
Then the COLUMNS clause defines the content of each row in the result set. The column “seq” is defined with the keywords FOR ORDINALITY to simply number the rows sequentially. The column “name” uses the the XPath function name() to retrieve the name of each element. In the third column, “parent”, the XPath ../name() retrieves the name of the parent of each element, which provides us with a sense of the document structure.
If your element names are longer than 20 characters then you may need to increase the length of the VARCHAR(20) columns in the query above.
You probably noted that the query result above did not list any attributes that occur in the original XML document. To get all attributes, you can use //@* instead of //* :
SELECT T.*
FROM purchaseorder,
XMLTABLE('$PORDER//@*'
COLUMNS
seq FOR ORDINALITY,
attribute VARCHAR(20) PATH 'name(.)',
parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;
SEQ ATTRIBUTE PARENT
----- -------------------- --------------------
1 PoNum PurchaseOrder
2 OrderDate PurchaseOrder
3 Status PurchaseOrder
3 record(s) selected.
But, most likely you would like to get a combined list of elements and attributes. To achieve this, combine the expressions //* and //@* as shown in the next query. The expression //(*, @*) means that you want to traverse the entire document and match any element or attribute that is encountered:
SELECT T.*
FROM purchaseorder,
XMLTABLE('$PORDER//(*, @*)'
COLUMNS
seq FOR ORDINALITY,
node VARCHAR(20) PATH 'name(.)',
parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;
SEQ NODE PARENT
----- -------------------- --------------------
1 PurchaseOrder
2 PoNum PurchaseOrder
3 OrderDate PurchaseOrder
4 Status PurchaseOrder
5 item PurchaseOrder
6 partid item
7 name item
8 quantity item
9 price item
10 item PurchaseOrder
11 partid item
12 name item
13 quantity item
14 price item
14 record(s) selected.
At this point you are probably wondering how to distinguish elements from attributes in the list above. Ok, we can extend the previous query to provide this extra information. The next query has the additional column “type” whose value is defined by an XQuery if-then-else expression that checks whether the current node is an attribute. If not, then we simply assume it’s an element:
SELECT T.*
FROM purchaseorder,
XMLTABLE('$PORDER//(*, @*)'
COLUMNS
seq FOR ORDINALITY,
node VARCHAR(20) PATH 'name(.)',
type VARCHAR(15) PATH 'if (self::attribute())
then "Attribute"
else "Element"',
parent VARCHAR(20) PATH '../name(.)' ) AS T
WHERE poid = 5000;
SEQ NODE TYPE PARENT
----- -------------------- --------------- --------------------
1 PurchaseOrder Element
2 PoNum Attribute PurchaseOrder
3 OrderDate Attribute PurchaseOrder
4 Status Attribute PurchaseOrder
5 item Element PurchaseOrder
6 partid Element item
7 name Element item
8 quantity Element item
9 price Element item
10 item Element PurchaseOrder
11 partid Element item
12 name Element item
13 quantity Element item
14 price Element item
14 record(s) selected.
And finally, wouldn’t it be nice to also display the value of each element and attribute in this list? The next query includes the column “value” for this purpose. The condition “if (not(./*))” checks whether a node in the document has no children. If this is true, then it’s typically an attribute or a leaf element for which it makes sense to report a value:
SELECT T.*
FROM purchaseorder,
XMLTABLE('$PORDER//(*, @*)'
COLUMNS
seq FOR ORDINALITY,
node VARCHAR(20) PATH 'name(.)',
value VARCHAR(20) PATH 'if (not(./*))
then substring(.,1,20) else ()',
type VARCHAR(15) PATH 'if (self::attribute())
then "Attribute"
else "Element"',
parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000;
SEQ NODE VALUE TYPE PARENT
--- -------------- ------------------- ---------- -------------
1 PurchaseOrder - Element
2 PoNum 5000 Attribute PurchaseOrder
3 OrderDate 2006-02-18 Attribute PurchaseOrder
4 Status Unshipped Attribute PurchaseOrder
5 item - Element PurchaseOrder
6 partid 100-100-01 Element item
7 name Snow Shovel, Basic Element item
8 quantity 3 Element item
9 price 9.99 Element item
10 item - Element PurchaseOrder
11 partid 100-103-01 Element item
12 name Snow Shovel, Super Element item
13 quantity 5 Element item
14 price 49.99 Element item
14 record(s) selected.
Ok, you probably get the picture. You can further refine and adjust these queries to your needs. Let me know if you have any questions on any of this…

