tokenize it !
January 25, 2010
In my previous post I wrote about XPath and XQuery functions in DB2 vs. SQL Server 2008. One particularly powerful function is tokenize. The tokenize function takes a string as input and splits it into several substrings based on a repeating delimiter. The delimiter can be a single character or a pattern defined by a regular expression. The tokenize function is available in DB2 for z/OS (here) and DB2 for Linux, UNIX, and Windows (here).
Here is a very simple example where the string “1-3-26-43-74″ is split along the character “-”. This eample is executed in the Command Line Processor of DB2 for Linux, UNIX, and Windows:
db2 => xquery tokenize("1-3-26-43-74", "-");
1
3
26
43
74
5 record(s) selected.
db2 =>
You might sometimes see that XPath function names are prefixed with “fn:”, which is the namespace prefix for functions. However, the namespace for XPath functions is implicitly predeclared so that you can omit the prefix “fn:”. That is, you can simply use the function name tokenize instead of fn:tokenize, as I did above.
Here is another example where the tokenize function comes in handy. In this example an XML document describes (some of) the environment variables of a user account on a Linux machine. Some enviroment variables, such as PATH, CLASSPATH, etc. contain lists of paths delimited by the colon (:) sign.
create table myenv(doc XML);
insert into myenv values('
<ENVIRONMENT os="Linux">
<HOSTNAME>hermes</HOSTNAME>
<PATH>/home/mnicola/bin:/usr/bin:/usr/local/bin:/opt/newapp/bin:
/home/mnicola/sqllib/bin:/home/mnicola/sqllib/adm:
/home/mnicola/sqllib/misc:/home/mnicola/sqllib/int</PATH>
<MANPATH>/usr/local/man:/usr/share/man:/usr/X11R6/man</MANPATH>
<CLASSPATH>.:/home/mnicola/sqllib/java/db2java.zip:/home/mnicola/sqllib/java/db2jcc.jar:
/home/mnicola/sqllib/java/sqlj.zip:/home/mnicola/sqllib/java/runtime.zip:/test</CLASSPATH>
</ENVIRONMENT>');
Then you can use the following simple query in DB2 to read and tokenize the CLASSPATH and return each path on a separate row:
xquery
db2-fn:xmlcolumn("MYENV.DOC")/ENVIRONMENT[HOSTNAME="hermes"]/tokenize(CLASSPATH, ":");
--------------------------------
.
/home/mnicola/sqllib/java/db2java.zip
/home/mnicola/sqllib/java/db2jcc.jar
/home/mnicola/sqllib/java/sqlj.zip
/home/mnicola/sqllib/java/runtime.zip
/test
6 record(s) selected.
To be precise, the tokenize function returns a sequence of strings. On a sequence you can apply positional predicates to pick selected items from the sequence. The following query uses the tokenize function with positional predicates [1], [2], and [3] to obtain the first, second, and third entry from the MANPATH enviroment variable:
SELECT XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[1] '
passing doc as "d") as path1,
XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[2] '
passing doc as "d") as path2,
XMLQUERY('$d/ENVIRONMENT/tokenize(MANPATH, ":")[3] '
passing doc as "d") as path3
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' PASSING doc as "d");
PATH1 PATH2 PATH3
-------------------- -------------------- --------------------
/usr/local/man /usr/share/man /usr/X11R6/man
1 record(s) selected.
In the next example the XPath function count is wrapped around tokenize to obtain the number of entries of the CLASSPATH.
SELECT XMLQUERY('$d/ENVIRONMENT/count(tokenize(CLASSPATH, ":") )'
passing doc as "d") as length_classpath
FROM myenv
WHERE XMLEXISTS('$d/ENVIRONMENT[HOSTNAME="hermes"]' passing doc as "d");
LENGTH_CLASSPATH
-----------------------------
6
1 record(s) selected.
I’m sure you will find many more use cases where the tokenize function is helpful!


November 21, 2012 at 4:47 am
Get this error: SQLCODE = -16011, ERROR: THE RESULT OF AN INTERMEDIATE STEP EXPRESSION IN AN XQUERY PATH EXPRESSION CONTAINS AN ATOMIC VALUE
November 21, 2012 at 10:24 am
What was the query that you executed? And what did the document look like on which the query failed with this error?