tokenize it !

January 25, 2010

In my previous post I wrote about XPath and XQuery functions in DB2 vs. SQL Server 2008. One particularly powerful function is tokenize. The tokenize function takes a string as input and splits it into several substrings based on a repeating delimiter. The delimiter can be a single character or a pattern defined by a regular expression. The tokenize function is available in DB2 for z/OS (here) and DB2 for Linux, UNIX, and Windows (here).   

Here is a very simple example where the string “1-3-26-43-74″ is split along the character “-”. This eample is executed in the Command Line Processor of DB2 for Linux, UNIX, and Windows:   


db2 => xquery tokenize("1-3-26-43-74", "-");

1
3
26
43
74
 
  5 record(s) selected.
 
db2 =>

You might sometimes see that XPath function names are prefixed with “fn:”, which is the namespace prefix for functions. However, the namespace for XPath functions is implicitly predeclared so that you can omit the prefix “fn:”. That is, you can simply use the function name tokenize instead of fn:tokenize, as I did above. 

Here is another example where the tokenize function comes in handy.  In this example an XML document describes (some of) the environment variables of a user account on a Linux machine. Some enviroment variables, such as PATH, CLASSPATH, etc. contain lists of paths delimited by the colon (:) sign.   


create table myenv(doc XML);

insert into myenv values('
<ENVIRONMENT os="Linux">
   <HOSTNAME>hermes</HOSTNAME>
   <PATH>/home/mnicola/bin:/usr/bin:/usr/local/bin:/opt/newapp/bin:
         /home/mnicola/sqllib/bin:/home/mnicola/sqllib/adm:
         /home/mnicola/sqllib/misc:/home/mnicola/sqllib/int</PATH>
   <MANPATH>/usr/local/man:/usr/share/man:/usr/X11R6/man</MANPATH>
   <CLASSPATH>.:/home/mnicola/sqllib/java/db2java.zip:/home/mnicola/sqllib/java/db2jcc.jar:
              /home/mnicola/sqllib/java/sqlj.zip:/home/mnicola/sqllib/java/runtime.zip:/test</CLASSPATH>
</ENVIRONMENT>');
 

Then you can use the following simple query in DB2 to read and tokenize the CLASSPATH and return each path on a separate row:

xquery
db2-fn:xmlcolumn("MYENV.DOC")/ENVIRONMENT[HOSTNAME="hermes"]/tokenize(CLASSPATH, ":");
 
--------------------------------
.
/home/mnicola/sqllib/java/db2java.zip
/home/mnicola/sqllib/java/db2jcc.jar
/home/mnicola/sqllib/java/sqlj.zip
/home/mnicola/sqllib/java/runtime.zip
/test
 
  6 record(s) selected.
 

To be precise, the tokenize function returns a sequence of strings. On a sequence you can apply positional predicates to pick selected items from the sequence. The following query uses the tokenize function with positional predicates [1], [2], and [3] to obtain the first, second, and third entry from the MANPATH enviroment variable: 


SELECT XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[1] '
                 passing doc as "d") as path1,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[2] '
                 passing doc as "d") as path2,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[3] '
                 passing doc as "d") as path3
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' PASSING doc as "d");
 

PATH1                PATH2                PATH3
-------------------- -------------------- --------------------
/usr/local/man       /usr/share/man       /usr/X11R6/man
 
 1 record(s) selected.
 

In the next example the XPath function count is wrapped around tokenize to obtain the number of entries of the CLASSPATH.


SELECT XMLQUERY('$d/ENVIRONMENT/count(tokenize(CLASSPATH, ":")  )' 
                 passing doc as "d")  as length_classpath
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' passing doc as "d");

LENGTH_CLASSPATH 
-----------------------------
6 

 1 record(s) selected.
 

I’m sure you will find many more use cases where the tokenize function is helpful!

About these ads

5 Responses to “tokenize it !”

  1. lilia medina Says:

    Get this error: SQLCODE = -16011, ERROR: THE RESULT OF AN INTERMEDIATE STEP EXPRESSION IN AN XQUERY PATH EXPRESSION CONTAINS AN ATOMIC VALUE


  2. What was the query that you executed? And what did the document look like on which the query failed with this error?

  3. Walter Huth Says:

    Hi Matthias,
    I get the same error as lilia medina in DB2 10 for z/OS when trying to get one item out of a sequence produced by tokenize().
    First the data, then the error:
    ———+———+———+———+———+———+———+
    SELECT XMLQUERY (‘$i/customerinfo/tokenize(phone[@type="home"],”-”)’
    PASSING INFO AS “i” )
    AS PHONE_NO
    FROM XMLCUSTOMER
    WHERE XMLEXISTS(‘$i/customerinfo/phone[@type="home"]‘
    PASSING INFO AS “i” ) ;
    ———+———+———+———+———+———+———+
    PHONE_NO
    ———+———+———+———+———+———+———+
    408 555 1375
    DSNE610I NUMBER OF ROWS DISPLAYED IS 1
    ———+———+———+———+———+———+———+
    SELECT XMLQUERY (‘$i/customerinfo/tokenize(phone[@type="home"],”-”)[1]‘
    PASSING INFO AS “i” )
    AS AREA_CODE
    FROM XMLCUSTOMER
    WHERE XMLEXISTS(‘$i/customerinfo/phone[@type="home"]‘
    PASSING INFO AS “i” )
    ———+———+———+———+———+———+———+–
    DSNT408I SQLCODE = -16011, ERROR: THE RESULT OF AN INTERMEDIATE STEP
    EXPRESSION IN AN XQUERY PATH EXPRESSION CONTAINS AN ATOMIC VALUE.
    ERROR QNAME=err:XPTY0019
    DSNT418I SQLSTATE = 10507 SQLSTATE RETURN CODE
    DSNT415I SQLERRP = DSNXQXPS SQL PROCEDURE DETECTING ERROR
    DSNT416I SQLERRD = -600 0 0 -1 0 0 SQL DIAGNOSTIC INFORMATION
    DSNT416I SQLERRD = X’FFFFFDA8′ X’00000000′ X’00000000′ X’FFFFFFFF’
    X’00000000′ X’00000000′ SQL DIAGNOSTIC INFORMATION
    ———+———+———+———+———+———+———+–
    Any suggestions? Best greetings – Walter


    • Hi Walter,

      while these queries work fine in DB2 LUW, I understand that DB2 for z/OS has some limitations to how positional predicates (such as “[1]” in this query) can be applied.

      In DB2 10 for z/OS you can try to use the following workaround:

      SELECT XMLQUERY (‘for $x at $p in $i/customerinfo/tokenize(phone[@type="home"],”-”)
      where $p = 1
      return $x’
      PASSING INFO AS “i” ) AS area_code
      FROM CUSTOMER
      WHERE XMLEXISTS(‘$i/customerinfo/phone[@type="home"]‘
      PASSING INFO AS “i” ) ;

      The “at $p” in the FLWOR expression acts like a positional counter that iterates over the items produced by the XPath expression, and then the “where” clause selects the item at position 1.

      Does this work for you?

      Cheers,

      Matthias

  4. Walter Huth Says:

    Hi, Matthias,
    just checked your solution in DB2 for z/OS: Works great!
    Thank you very much – Walter


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: