tokenize it !

January 25, 2010

In my previous post I wrote about XPath and XQuery functions in DB2 vs. SQL Server 2008. One particularly powerful function is tokenize. The tokenize function takes a string as input and splits it into several substrings based on a repeating delimiter. The delimiter can be a single character or a pattern defined by a regular expression. The tokenize function is available in DB2 for z/OS (here) and DB2 for Linux, UNIX, and Windows (here).   

Here is a very simple example where the string “1-3-26-43-74″ is split along the character “-”. This eample is executed in the Command Line Processor of DB2 for Linux, UNIX, and Windows:   


db2 => xquery tokenize("1-3-26-43-74", "-");

1
3
26
43
74
 
  5 record(s) selected.
 
db2 =>

You might sometimes see that XPath function names are prefixed with “fn:”, which is the namespace prefix for functions. However, the namespace for XPath functions is implicitly predeclared so that you can omit the prefix “fn:”. That is, you can simply use the function name tokenize instead of fn:tokenize, as I did above. 

Here is another example where the tokenize function comes in handy.  In this example an XML document describes (some of) the environment variables of a user account on a Linux machine. Some enviroment variables, such as PATH, CLASSPATH, etc. contain lists of paths delimited by the colon (:) sign.   


create table myenv(doc XML);

insert into myenv values('
<ENVIRONMENT os="Linux">
   <HOSTNAME>hermes</HOSTNAME>
   <PATH>/home/mnicola/bin:/usr/bin:/usr/local/bin:/opt/newapp/bin:
         /home/mnicola/sqllib/bin:/home/mnicola/sqllib/adm:
         /home/mnicola/sqllib/misc:/home/mnicola/sqllib/int</PATH>
   <MANPATH>/usr/local/man:/usr/share/man:/usr/X11R6/man</MANPATH>
   <CLASSPATH>.:/home/mnicola/sqllib/java/db2java.zip:/home/mnicola/sqllib/java/db2jcc.jar:
              /home/mnicola/sqllib/java/sqlj.zip:/home/mnicola/sqllib/java/runtime.zip:/test</CLASSPATH>
</ENVIRONMENT>');
 

Then you can use the following simple query in DB2 to read and tokenize the CLASSPATH and return each path on a separate row:

xquery
db2-fn:xmlcolumn("MYENV.DOC")/ENVIRONMENT[HOSTNAME="hermes"]/tokenize(CLASSPATH, ":");
 
--------------------------------
.
/home/mnicola/sqllib/java/db2java.zip
/home/mnicola/sqllib/java/db2jcc.jar
/home/mnicola/sqllib/java/sqlj.zip
/home/mnicola/sqllib/java/runtime.zip
/test
 
  6 record(s) selected.
 

To be precise, the tokenize function returns a sequence of strings. On a sequence you can apply positional predicates to pick selected items from the sequence. The following query uses the tokenize function with positional predicates [1], [2], and [3] to obtain the first, second, and third entry from the MANPATH enviroment variable: 


SELECT XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[1] '
                 passing doc as "d") as path1,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[2] '
                 passing doc as "d") as path2,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[3] '
                 passing doc as "d") as path3
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' PASSING doc as "d");
 

PATH1                PATH2                PATH3
-------------------- -------------------- --------------------
/usr/local/man       /usr/share/man       /usr/X11R6/man
 
 1 record(s) selected.
 

In the next example the XPath function count is wrapped around tokenize to obtain the number of entries of the CLASSPATH.


SELECT XMLQUERY('$d/ENVIRONMENT/count(tokenize(CLASSPATH, ":")  )' 
                 passing doc as "d")  as length_classpath
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' passing doc as "d");

LENGTH_CLASSPATH 
-----------------------------
6 

 1 record(s) selected.
 

I’m sure you will find many more use cases where the tokenize function is helpful!

About these ads

2 Responses to “tokenize it !”

  1. lilia medina Says:

    Get this error: SQLCODE = -16011, ERROR: THE RESULT OF AN INTERMEDIATE STEP EXPRESSION IN AN XQUERY PATH EXPRESSION CONTAINS AN ATOMIC VALUE


  2. What was the query that you executed? And what did the document look like on which the query failed with this error?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 43 other followers

%d bloggers like this: