How to migrate XML from LOB to XML columns

October 3, 2010

Prior to  the emergence of native XML database technology, many XML applications have stored their XML documents in files systems, in LOB columns (large objects) in a database, or they have shredded the XML to relational tables.

A common question is how to migrate XML documents from a LOB column to an XML column. The benefit of an XML column is that the XML is stored natively, i.e. in a parsed format, so it becomes possible t0 index, query, and update the XML efficiently. Migrating XML from LOB to XML columns is easy. Let’s take a look at how you can do this in DB2 for Linux, UNIX, and Windows:

There are several different options, at least three:

  1. You can export the old table (with the LOB column) and then load into a new table where the LOB column has been replaced by an XML column. You can then drop the old table and rename the new table to the old table name.
  2. If you don’t want to export the table to the file system, you can use “load from cursor” to move the data directly from the old to the new table.
  3. You can add an XML column to the original table, copy the data from the LOB column to the XML column, and then drop the LOB column.

To illustrate these options, let’s first create a table with a LOB column and insert a couple of XML documents:

CREATE TABLE oldtable(id INT, doc CLOB);

INSERT INTO oldtable VALUES(1, ‘<url></url>&#8217;);
INSERT INTO oldtable VALUES(2, ‘<url></url>&#8217;);

Now we can try each of the three options.

1. Export + Load

The following commands export the “oldtable”, create a new table, and load the exported data into the new table:

CREATE TABLE newtable1(id INT, doc XML);
LOAD FROM old.del OF DEL INSERT INTO newtable1(id,doc);

Yes, it’s really that simple.

Be aware that CLOB or BLOB columns are very forgiving, i.e. they allow you to insert pretty much anything and they don’t check whether any inserted XML is well-formed or not. If your LOB column contains any XML documents that are not well-formed then those document will be rejected when you try to load them into the XML column. The LOAD command gives you a summary of how many rows (documents) were successfully loaded and how many were rejected. In our example the load summary looks like this:

Number of rows read         = 2
Number of rows skipped      = 0
Number of rows loaded       = 2
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 2

After your new table is loaded, you could drop the old table and rename the new one (subject to certain restrictions on the rename command):

DROP TABLE oldtable;
RENAME TABLE newtable TO oldtable;

2. Load from cursor

Loading from a cursor means that you read data from one table and load it into another table. Instead of exporting the “oldtable” you simply declare a cursor for it, which the LOAD command can use as input. In fact, the cursor declaration can contain any SQL query, and the result of that query then defines the input to the LOAD command:

CREATE TABLE newtable2(id INT, doc XML);
DECLARE cur1 CURSOR FOR SELECT id, doc FROM oldtable;
LOAD FROM cur1 OF CURSOR INSERT INTO newtable2(id,doc);

You could potentially also use an SQL insert statement such as “INSERT INTO newtable2 SELECT * FROM oldtable” to move data from one table to another. However, this requires all the data to be logged and you may run into a log full error. Also, the LOAD utility can write data to a table faster than an insert statement can.

Based on available system resources the LOAD utility automatically chooses an appropriate degree of parallelism and other tuning options. You can sometimes improve LOAD performance (and the degree of LOAD parallelism) if you increase DB2’s utility heap (util_heap_sz). For very large LOAD operations you may also want to review the general tuning guidelines for LOAD utility.

3. Add XML column, drop LOB column

You can use the following commands to add an XML column to the original table, copy the data from the LOB column to the XML column, and then drop the LOB column:

UPDATE oldtable SET xmldoc = doc;

The potential problem here is that the UPDATE statement is a large transaction that writes a lot of data that needs to be logged. You might run out of log space.  To avoid this problem, you could:
(a) use batch-wise updates, i.e. update 1000 rows, then commit, then update the next 1000 rows, and so on.
(b) use ” alter table oldtable activate not logged initially;” to temporarily disable logging for the table. (Note that this statement has effect only for a single transaction, so you must issue it before to the UPDATE statement and in the same transaction. For this to work in the DB2 Command Line Processor, autocommit must be disabled, e.g. “db2 +c -t”)

General migration considerations

No matter whether you create and populate tables with XML columns from scratch or migrate XML documents from LOB to XML columns, you always have a set of physical design choices for your XML column and table. These include the page size (large pages tend to be better for XML performance), XML inlining, using a separate table space or buffer pool for the XML data, and so on. These decisions depend on the nature of your applications that use your database. When in doubt, keep it simple. However, it is worthwhile to point out that LOB columns do not use any buffer pool while the XML columns do. Hence, the migration from LOB to XML column means that more data will be buffered than before. Hence, you may want to increase the existing buffer pool for the table, if you can.


7 Responses to “How to migrate XML from LOB to XML columns”

  1. Another option to convert a LOB column to an XML column can be the ADMIN_MOVE_TABLE procedure. This procedure creates a copy of a table but allows you to change column options, data types, storage options, even the number of columns in the table.

  2. Victor Says:

    db2 “update infomed.tio030_item set xml_item_information=item_information”

    • Victor Says:

      When I run the above statement, I get a SQL16132N XML document contains an invalid document structure.
      SQLSTATE=2200M. Any idea why this might be the case? The source column is of type XMLCLOB (XML extender), and the documents should be well formed.

      Many thanks for the informative blog.

  3. Hi Victor,
    thanks for your msg here.
    With the XML Extender it is possible to insert documents into the XMLCLOB column without verification that the document is well-formed. As a first step I would examine the document to check whether it is indeed well-formed or not. For example, open the XML document in a web browser to see if it gets parsed correctly.

    If the document appears to be well-formed, examine the document with a hex editor to check the first few characters in the document. Does the document have a UTF byte-order mark? And does it have an XML declaration without an encoeding attribute? For details on why that might matter, see:

    Also, which version and fixpack of DB2 are you running?



    • Victor Says:

      Many thanks for the reply, Matthias.

      We are using a special build based on DB2 9.5 FP8.

      I believe the problem is the that the documents stored in the XMLCLOB column contain.

      If I remove this tag, the document loads into the XML column without issue. But with it, I get the SQL16132N errors described previously.

      Is there any way at all to load documents with this tag, or would it be necessary to remove it before loading? If it is necessary to remove it, it would seem to me that migration of large environments could be quite difficult.

      Is there any technique yo could suggest to get around this issue?

      Many thanks again for your insight.

      • Victor Says:

        Looks like the website removed the XML tag.. I am referring to a DTD tag similar to DOCTYPE Items SYSTEM “xx.dtd”.

        This is the tag that is preventing the document from being loaded into an XML column.

  4. Hi Victor…. ah, I see. If an XML document refers to an external DTD using the DOCTYPE tag, then this DTD must be registered in DB2’s schema repository (XSR) before you can insert or LOAD this document.

    The reason is the following: A DTD can define
    default values for attributes, and it can also define so-called entities that can be referenced in the XML documents.
    To store a correct representation of such documents, DB2 needs to access the DTD to check for default attribute values and to resolve entity references as needed.

    Here is a sample command for how to register a DTD in DB2:

    REGISTER XSROBJECT ‘customer.dtd’ FROM ‘file:c:/xml/DTDs/customer.dtd’ AS db2admin.custdtd;

    Does that help?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: