Watch your step! Positional predicates in XPath.

January 11, 2010

Some time ago I received an interesting question about how to code a specific XPath expression. I think the question and the answer are worthwhile sharing here. 

The question: 

Consider the following XML document. The document describes an article that has a title, a publication date, and several sections. Each section has a body, and some sections also contain a reference to a figure. Admittedly, I contrived this document to be able to explain this issue in a simple way. 

 <article id="62083">
  <title>A short story about apple pie</title>
  <pubdate>2009-08-21T09:30:00</pubdate>
  <section>
      <body>Once upon a time...</body>
  </section>
  <section>
      <body>...</body>
      <figure>drawing15.jpg</figure>
  </section>
  <section>
      <body>...</body>
      <figure>screenshot85.tif</figure>
  </section>
  <section>
      <body>...</body>
  </section>
</article>

Now the question is: how can I retrieve the first or the last <figure> from the article? The answer is slightly more tricky than what you might think! You probably know that XPath allows for positional predicates that choose elements based on the order in which they appear. For example, if the variable $doc holds the document shown above, then the following XPath expressions retrieve the first and the last <section> of the article, respectively:

$doc/article/section[1]
$doc/article/section[last()]

But, the question was how to retrieve the first or last <figure>. Intuitively, one might try the following XPath expression to obtain the first <figure> element:

$doc/article/section/figure[1]

However, this XPath returns both <figure> elements from the document:    

    <figure>drawing15.jpg</figure>
    <figure>screenshot85.tif</figure> 

This result may seem unexpected, but it is perfectly correct. The XPath expression has asked for the first figure within each section, and that’s exactly what we got: the first figure from each section. It’s just not what we wanted.   

The solution:

There are several ways to retrieve the first <figure> of the entire document rather than the first <figure> of each section. One approach is to use parentheses in order to apply the positional predicate [1] to the sequence of all <figure> elements, rather than to the figures within each section:

($doc/article/section/figure)[1]

Another approach is to navigate to the first section that actually contains a <figure>, and then to return the first <figure> element from that section:

$doc/article/section[figure][1]/figure[1]

Note that the XPath step section[figure][1] uses two predicates. The predicate [figure] selects all sections that contain a figure, and the predicate [1] selects the first of those sections. Similarly, you can use either one of the following two XPath expressions to retrieve the last <figure> element from our sample document:

($doc/article/section/figure)[last()]
$doc/article/section[figure][last()]/figure[last()]

With the attached script you can try it out yourself – in DB2 for Linux, UNIX, and Windows or DB2 for z/OS. (Click on the “source” link below.)

 

create table mytest(mydoc XML);
 

insert into mytest(mydoc) values('<article id="62083">
 <title>A short story about apple pie</title>
 <pubdate>2009-08-21T09:30:00</pubdate>
 <section>
  <body>Once upon a time...</body>
 </section>
 <section>
  <body>...</body>
  <figure>drawing15.jpg</figure>
 </section>
 <section>
  <body>...</body>
  <figure>screenshot85.tif</figure>
 </section>
 <section>
  <body>...</body>
 </section>
</article>');
 

-- SQL/XML (DB2 for z/OS and DB2 for Linux, UNIX, and Windows):

SELECT XMLQUERY('$doc/article/section/figure[1]'
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY('$doc/article/section/figure[last()]' 
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY(' ($doc/article/section/figure)[1]'
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY(' ($doc/article/section/figure)[last()]'
                 PASSING mydoc as "doc")
FROM mytest;
 

SELECT XMLQUERY('$doc/article/section[figure][1]/figure[1]'
                 PASSING mydoc as "doc")
FROM mytest;
 

-- XQuery (DB2 for Linux, UNIX, and Windows only):

xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure[1];
xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure[last()];
xquery (db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure)[1];
xquery (db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section/figure)[last()];
xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section[figure][1]/figure[1];
xquery db2-fn:xmlcolumn("MYTEST.MYDOC")/article/section[figure][last()]/figure[last()];

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: