tokenize it !

January 25, 2010

In my previous post I wrote about XPath and XQuery functions in DB2 vs. SQL Server 2008. One particularly powerful function is tokenize. The tokenize function takes a string as input and splits it into several substrings based on a repeating delimiter. The delimiter can be a single character or a pattern defined by a regular expression. The tokenize function is available in DB2 for z/OS (here) and DB2 for Linux, UNIX, and Windows (here).   

Here is a very simple example where the string “1-3-26-43-74″ is split along the character “-”. This eample is executed in the Command Line Processor of DB2 for Linux, UNIX, and Windows:   


db2 => xquery tokenize("1-3-26-43-74", "-");

1
3
26
43
74
 
  5 record(s) selected.
 
db2 =>

You might sometimes see that XPath function names are prefixed with “fn:”, which is the namespace prefix for functions. However, the namespace for XPath functions is implicitly predeclared so that you can omit the prefix “fn:”. That is, you can simply use the function name tokenize instead of fn:tokenize, as I did above. 

Here is another example where the tokenize function comes in handy.  In this example an XML document describes (some of) the environment variables of a user account on a Linux machine. Some enviroment variables, such as PATH, CLASSPATH, etc. contain lists of paths delimited by the colon (:) sign.   


create table myenv(doc XML);

insert into myenv values('
<ENVIRONMENT os="Linux">
   <HOSTNAME>hermes</HOSTNAME>
   <PATH>/home/mnicola/bin:/usr/bin:/usr/local/bin:/opt/newapp/bin:
         /home/mnicola/sqllib/bin:/home/mnicola/sqllib/adm:
         /home/mnicola/sqllib/misc:/home/mnicola/sqllib/int</PATH>
   <MANPATH>/usr/local/man:/usr/share/man:/usr/X11R6/man</MANPATH>
   <CLASSPATH>.:/home/mnicola/sqllib/java/db2java.zip:/home/mnicola/sqllib/java/db2jcc.jar:
              /home/mnicola/sqllib/java/sqlj.zip:/home/mnicola/sqllib/java/runtime.zip:/test</CLASSPATH>
</ENVIRONMENT>');
 

Then you can use the following simple query in DB2 to read and tokenize the CLASSPATH and return each path on a separate row:

xquery
db2-fn:xmlcolumn("MYENV.DOC")/ENVIRONMENT[HOSTNAME="hermes"]/tokenize(CLASSPATH, ":");
 
--------------------------------
.
/home/mnicola/sqllib/java/db2java.zip
/home/mnicola/sqllib/java/db2jcc.jar
/home/mnicola/sqllib/java/sqlj.zip
/home/mnicola/sqllib/java/runtime.zip
/test
 
  6 record(s) selected.
 

To be precise, the tokenize function returns a sequence of strings. On a sequence you can apply positional predicates to pick selected items from the sequence. The following query uses the tokenize function with positional predicates [1], [2], and [3] to obtain the first, second, and third entry from the MANPATH enviroment variable: 


SELECT XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[1] '
                 passing doc as "d") as path1,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[2] '
                 passing doc as "d") as path2,
       XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[3] '
                 passing doc as "d") as path3
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' PASSING doc as "d");
 

PATH1                PATH2                PATH3
-------------------- -------------------- --------------------
/usr/local/man       /usr/share/man       /usr/X11R6/man
 
 1 record(s) selected.
 

In the next example the XPath function count is wrapped around tokenize to obtain the number of entries of the CLASSPATH.


SELECT XMLQUERY('$d/ENVIRONMENT/count(tokenize(CLASSPATH, ":")  )' 
                 passing doc as "d")  as length_classpath
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' passing doc as "d");

LENGTH_CLASSPATH 
-----------------------------
6 

 1 record(s) selected.
 

I’m sure you will find many more use cases where the tokenize function is helpful!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 28 other followers