Get Started with XML Full-Text Search in DB2

February 21, 2010

If you perform data-oriented queries on XML document, then your queries typically contain predicates on dates, numbers, names, zip codes, and other short values. For such queries you define regular XML indexes to speed up the predicate evaluation. You can read about that in the article Exploit XML indexes for XML query performance in DB2 or in Chapter 13 of the DB2 pureXML Cookbook.

In contrast, there are also situations where you may want to perform text-oriented XML queries. In this case your queries may have to check whether a certain XML element, or a whole document, contains a certain search term. You can also use wildcards to perform substring or prefix searches.

Below you find a simple and commented example of how to get started with DB2 Text Search for XML Data. DB2 Text Search is powered by the Omnifind search engine, which is based on Lucene. I have developed the examples below on DB2 Express-C 9.7 for Windows, which is freely available.

First we create a database and a table with a couple of XML documents:

-- At the OS prompt, issue the following commands to create and connect to a database:
db2 create database mytest
db2 connect to mytest

-- Let's tart the DB2 Command Line Processor to
--  create and populate a sample table with two XML documents:
db2 -t

-- Issue the "create table" and "insert" SQL statements:
create table order(order_id integer not null primary key, doc XML);

insert into order values (1000,
'<Order OrderDate="2006-03-01" Status="Shipped">
   <item price="9.99">
      <partid>SNS9471B001</partid>
      <name>Snow Shovel, Basic 22 inch</name>
      <quantity>3</quantity>
   </item>
   <ShippingInstructions>Rush Delivery Requested</ShippingInstructions>
</Order>');

insert into order values (2000,
'<Order OrderDate="2006-04-15" Status="Unshipped">
   <item price="19.99">
      <partid>SSD9483B007</partid>
      <name>Snow Shovel, Deluxe 24 inch</name>
      <quantity>5</quantity>
   </item>
   <item price = "3.49">
      <partid>ICR3117F013</partid>
      <name>Ice Scraper, Windshield 4 inch</name>
      <quantity>5</quantity>
   </item>
</Order>');

Now we need to start the DB2 text search service and enable our test database for text search.

--At the OS command line, the following command starts the text search service:
db2ts start for text

-- For convenience, set an enviroment variable to indicate the
-- database to which text search administration commands are applied.
-- Depending on your OS, use the SET or the EXPORT command:
SET DB2DBDFT=mytest

-- The next step is to enable the database 'mytest" for text search,
-- requires the DB2 instance owner to have DBADM authority. The way
-- my DB2 instance is installed on Windows, I need to grant admin
-- rights to "system" and "localsystem":
db2 grant dbadm on database to system
db2 grant dbadm on database to localsystem

-- Now enable our database "mytest" for text search. This creates tables
-- and triggers for housekeeping information.
db2ts enable database for text

-- Create a text index on the XML column of the order table. The
-- CREATE command only defines the text index but does not populate
-- it. Therefore we also issue the UPDATE command:
db2ts "CREATE INDEX ordertextindex FOR TEXT ON order(doc)"
db2ts "UPDATE INDEX ordertextindex FOR TEXT"

-- You can also configure peridic automatic updates of the text
-- index (omitted here).

Now we are ready to try some text search queries, using SQL/XML or XQuery. Let’s start with SQL/XML. The specific syntax of the search expressions within the contains function allows DB2 to use the text index. Compare each query and its result to the two sample XML documents that we inserted above.

-- (1) Identify all orders that contain special shipping instructions.
-- This a structural search, because it looks for the existence of
-- the element <ShippingInstructions> regardless of its value.

SELECT order_id,
       xmlcast(xmlquery('$DOC/Order/@OrderDate') as Date) as odate
FROM order
WHERE contains(doc, '@xpath:''/Order/ShippingInstructions''')=1;

ORDER_ID    ODATE
----------- ----------
       1000 03/01/2006

  1 record(s) selected.


-- (2) Identify all orders for any kind of shovels, i.e. orders where
-- any item name contains the word "shovel".

SELECT order_id, xmlcast(xmlquery('$DOC/Order/@OrderDate') as Date) as odate
FROM order
WHERE contains(doc, '@xpath:''/Order/item/name[. contains("Shovel")]''')=1;

ORDER_ID    ODATE
----------- ----------
       1000 03/01/2006
       2000 04/15/2006

  2 record(s) selected.


-- Use a wildcard (*) to identify all orders that contain an item
-- whose "partid" starts with "SNS9":

SELECT order_id,
       xmlcast(xmlquery('$DOC/Order/@OrderDate') as Date) as odate
FROM order
WHERE contains(doc, '@xpath:''/Order/item/partid[. contains("SNS9*")]''')=1;

ORDER_ID    ODATE
----------- ----------
       1000 03/01/2006

  1 record(s) selected.


-- Identify all orders that contain an item whose "partid"
-- ends in "F013":

SELECT order_id,
       xmlcast(xmlquery('$DOC/Order/@OrderDate') as Date) as odate
FROM order
WHERE contains(doc, '@xpath:''/Order/item/partid[. contains("*F013")]''')=1;

ORDER_ID    ODATE
----------- ----------
       2000 04/15/2006

  1 record(s) selected.

-- On XML attributes, the text index also supports numeric searches, e.g:
-- Find all orders that have an item with a price greater than 10:

SELECT order_id,
       xmlcast(xmlquery('$DOC/Order/@OrderDate') as Date) as odate
FROM order
WHERE contains(doc, '@xpath:''/Order/item[@price > 10]''')=1;

ORDER_ID    ODATE
----------- ----------
       2000 04/15/2006

  1 record(s) selected.

-- But, if numeric searches are frequent, then you better use a regular
-- XML index of type DOUBLE.

The same kind of search conditions can also be performed in XQuery, using the function db2-fn:xmlcolumn-contains:


-- Return the order documents that include any item whose name
-- contains the word "shovel":
xquery
for $i in db2-fn:xmlcolumn-contains('ORDER.DOC',
                '@xpath:''/Order/item/name[. contains("Shovel")]''' )
return $i

-- Return the individual items whose partid starts with "ICR3". The text
-- search predicate, and hence the text index, is used to quickly
-- identify the orders that contain at least one matching item. In the
-- return clause we then use the XQuery starts-with() function to not
-- return ALL items from the matching orders, but only those items
-- whose partid starts with "ICR3". This is because a single order
-- document can contain matching and non-matching items for our search.
xquery
for $i in db2-fn:xmlcolumn-contains('ORDER.DOC',
                '@xpath:''/Order/item/partid[. contains("ICR3*")]''' )
return $i/Order/item[starts-with(partid,"ICR3")];

DB2 Text Search also provides many advanced text search features such as:

  • wildcard search
  • scoring and ranking of search results
  • fuzzy search
  • weighting or boosting of search terms
  • stemming (search for any stemmed form of a word)
  • proximity search
  • synonym search
  • linguistic search
  • etc.

Check out the following resources:

About these ads

2 Responses to “Get Started with XML Full-Text Search in DB2”

  1. Arun Srini Says:

    I don’t think db2ts is available in express-C atleast. I am using the 9.7 udb in windows and have searched the filesystem and failed


    • Hi Arun,
      you raise an important point! I believe DB2 Text Search is an install option for DB2 Express-C. When you install DB2 Express-C, I think you need to choose a “custom install” and then explicitly select the DB2 Text Search feature, which is an optional component. I also saw this mentioned in the free DB2 Express-C book, but it wasn’t mentioned when Raul Chong first wrote about text search in DB2 Express-C 9.5.

      Thanks,

      Matthias


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: