How to migrate XML from LOB to XML columns
October 3, 2010
Prior to the emergence of native XML database technology, many XML applications have stored their XML documents in files systems, in LOB columns (large objects) in a database, or they have shredded the XML to relational tables.
A common question is how to migrate XML documents from a LOB column to an XML column. The benefit of an XML column is that the XML is stored natively, i.e. in a parsed format, so it becomes possible t0 index, query, and update the XML efficiently. Migrating XML from LOB to XML columns is easy. Let’s take a look at how you can do this in DB2 for Linux, UNIX, and Windows:
There are several different options, at least three:
- You can export the old table (with the LOB column) and then load into a new table where the LOB column has been replaced by an XML column. You can then drop the old table and rename the new table to the old table name.
- If you don’t want to export the table to the file system, you can use “load from cursor” to move the data directly from the old to the new table.
- You can add an XML column to the original table, copy the data from the LOB column to the XML column, and then drop the LOB column.
To illustrate these options, let’s first create a table with a LOB column and insert a couple of XML documents:
CREATE TABLE oldtable(id INT, doc CLOB);
Now we can try each of the three options.
The following commands export the “oldtable”, create a new table, and load the exported data into the new table:
EXPORT TO old.del OF DEL MODIFIED BY LOBSINFILE SELECT * FROM oldtable;
CREATE TABLE newtable1(id INT, doc XML);
LOAD FROM old.del OF DEL INSERT INTO newtable1(id,doc);
Yes, it’s really that simple.
Be aware that CLOB or BLOB columns are very forgiving, i.e. they allow you to insert pretty much anything and they don’t check whether any inserted XML is well-formed or not. If your LOB column contains any XML documents that are not well-formed then those document will be rejected when you try to load them into the XML column. The LOAD command gives you a summary of how many rows (documents) were successfully loaded and how many were rejected. In our example the load summary looks like this:
Number of rows read = 2
Number of rows skipped = 0
Number of rows loaded = 2
Number of rows rejected = 0
Number of rows deleted = 0
Number of rows committed = 2
After your new table is loaded, you could drop the old table and rename the new one (subject to certain restrictions on the rename command):
DROP TABLE oldtable;
RENAME TABLE newtable TO oldtable;
2. Load from cursor
Loading from a cursor means that you read data from one table and load it into another table. Instead of exporting the “oldtable” you simply declare a cursor for it, which the LOAD command can use as input. In fact, the cursor declaration can contain any SQL query, and the result of that query then defines the input to the LOAD command:
CREATE TABLE newtable2(id INT, doc XML);
DECLARE cur1 CURSOR FOR SELECT id, doc FROM oldtable;
LOAD FROM cur1 OF CURSOR INSERT INTO newtable2(id,doc);
You could potentially also use an SQL insert statement such as “INSERT INTO newtable2 SELECT * FROM oldtable” to move data from one table to another. However, this requires all the data to be logged and you may run into a log full error. Also, the LOAD utility can write data to a table faster than an insert statement can.
Based on available system resources the LOAD utility automatically chooses an appropriate degree of parallelism and other tuning options. You can sometimes improve LOAD performance (and the degree of LOAD parallelism) if you increase DB2’s utility heap (util_heap_sz). For very large LOAD operations you may also want to review the general tuning guidelines for LOAD utility.
3. Add XML column, drop LOB column
You can use the following commands to add an XML column to the original table, copy the data from the LOB column to the XML column, and then drop the LOB column:
ALTER TABLE oldtable ADD COLUMN xmldoc XML;
UPDATE oldtable SET xmldoc = doc;
ALTER TABLE oldtable DROP COLUMN doc;
The potential problem here is that the UPDATE statement is a large transaction that writes a lot of data that needs to be logged. You might run out of log space. To avoid this problem, you could:
(a) use batch-wise updates, i.e. update 1000 rows, then commit, then update the next 1000 rows, and so on.
(b) use ” alter table oldtable activate not logged initially;” to temporarily disable logging for the table. (Note that this statement has effect only for a single transaction, so you must issue it before to the UPDATE statement and in the same transaction. For this to work in the DB2 Command Line Processor, autocommit must be disabled, e.g. “db2 +c -t”)
General migration considerations
No matter whether you create and populate tables with XML columns from scratch or migrate XML documents from LOB to XML columns, you always have a set of physical design choices for your XML column and table. These include the page size (large pages tend to be better for XML performance), XML inlining, using a separate table space or buffer pool for the XML data, and so on. These decisions depend on the nature of your applications that use your database. When in doubt, keep it simple. However, it is worthwhile to point out that LOB columns do not use any buffer pool while the XML columns do. Hence, the migration from LOB to XML column means that more data will be buffered than before. Hence, you may want to increase the existing buffer pool for the table, if you can.