How to insert XML documents from the file system?

December 26, 2010

How to insert XML documents from the file system? Let’s discuss this common question for both, DB2 on z/OS and DB2 on Linux, UNIX, and Windows (LUW).

DB2 for z/OS offers two good options for moving XML documents from the file system into XML columns or LOB or varchar columns. The first option is to use the DB2 LOAD utility, which is described here. The second options is to use File Reference Variables in your application code to read LOB or XML files from the file system. This is a native solution and typically performs very well.

In a COBOl, PL/1, or C application, you can use a file reference variable of type BLOB_FILE, CLOB_FILE, or DBCLOB_FILE to insert a LOB or XML value from a file into a DB2 table. LOB file reference variables move LOB or XML values from an application to the database server without going through the application’s memory. Their usage is described here:

http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.apsg/db2z_lobfilereferencevar.htm

http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db29.doc.apsg/db2z_xmpdeclarefilereferencevar.htm

 

In DB2 LUW, a good way to move XML documents from a file system into a DB2 database is to use the DB2 LOAD or the DB2 IMPORT utility. Alternatively, applications can use custom code to read XML documents from their local file system and bind them to parameter markers of inserts statements that are then submitted to DB2.

In some cases you may want to insert XML files that reside in the file system of the DB2 database server – directly with insert statements, without using LOAD or IMPORT. If you have file access permissions on the database server (which might be restricted) then you can use simple user-defined functions (UDFs) in your insert statements to read one or multiple files from the file system. Here is a package “XMLFRomFile.zip” with examples of such UDFs.

The XMLfromFile functions in this package are described in section 4.1.2 of the DB2 pureXML Cookbook and also in the readme file that is included in the package. Some of most useful functions include:

  • blobFromFile: Reads a file from the DB2 server’s file system and returns the file contents as a BLOB. If this BLOB contains a well-formed XML document, it can be inserted into an XML column.
  • clobFromFile: Reads a file from the DB2 server’s file system and returns the file contents as a CLOB. If this CLOB contains a well-formed XML document, it can be inserted into an XML column.
  • directoryInfo: Returns a table with information about files in a directory in the DB2 server’s file system.
  • blobsFromZipURL/clobsFromZipURL: Table functions that read a ZIP file from a URL or file and return a table that contains each file from the ZIP in a separate row, as a BLOB or CLOB, respectively. Similar functions exist for gzip files.

And here are some examples of how you might use these functions.

Read the XML file “book07.xml” and insert it into the table called “shelf” where the column “bookinfo” is of type XML or BLOB:

INSERT INTO shelf(id, bookinfo)
VALUES(7,blobFromFile(‘/home/mydata/book07.xml’));

List all files that are located in the directory c:\xml\book:

SELECT filename, size, modtime
FROM TABLE(directoryInfo(‘c:\xml\book’)) ;

Insert all XML files from the directory c:\xml\book:

INSERT INTO shelf(name, bookinfo)
SELECT t.filename, blobFromFile(t.filename)
FROM TABLE(directoryInfo(‘c:\xml\book’)) t
WHERE isDirectory = 0 AND filename LIKE ‘%.xml’ ;

Insert all XML files from the ZIP file allbooks.zip:

INSERT INTO shelf(bookinfo)
SELECT doc FROM
TABLE(blobsFromZipURL(urlFromFile(‘c:\xml\book\allbooks.zip’)))
WHERE filename LIKE ‘%.xml’OR filename LIKE ‘%.XML’ ;

Note that the ZIP file does not get unzipped to the file system. Instead, the documents are read from the ZIP file and inserted directly into DB2 without uncompressing them in the file system.

About these ads

5 Responses to “How to insert XML documents from the file system?”

  1. Jyoti Says:

    Hi Matthias,

    I am working on a DB2 project as an engg assignment. I have followed ur material for a lot of stuff. Is there an OR operator in pureXML? or an IN operator? I am getting no material or documentation on this. Can you help me with the syntax of it and can it be used within fn:contains.

    Regards,
    Jyoti Mehta

  2. Jyoti Says:

    Thank you Matthias. The last one is what I was looking for but my OR condition is between two different attributes: as in
    If my XML is such

    hhh
    yyy 10101

    uuu

    qqq 6666

    In this case how would I do an OR / AND of HQ city and Home city using fn:contains

  3. Peter Odéus Says:

    Hi Matthias.

    Regarding “blobsFromZipURL”: I need to find out whether a “compressed XML” approach is applicable in a jdbc scenario.

    The thing is: In order to avoid 100Mb XML documents arriving from MQ to our Websphere application server and further on towards DB2, we are looking for an end-to-end solution where memory footprint is minimized.

    By letting heavily zipped XML documents flow from MQ to our app server, I guess we can do validation etcetera using SAX/StAX, piping the XML document via an inflating stream. All well, so far.

    But when it comes to the eventual transactional jdbc-insertion into the database, I need the compressed XML document to be inflated only when it arrives at the DB2 server.

    Is it possible to do this in java/jdbc/DB2, while maintaining a transactional behaviour?

    All the best!

    /Peter Odéus


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: