March 26, 2010
The handling of XML data has become increasingly important in ETL (extract, transform, and load) processes. Here are some typical use cases:
For example, you might be accumulating XML messages in a transactional database and may need to extract selected values from these messages and add them to a relational data warehouse.
In other cases you might have to move XML data from one system to another and wish to transform the XML documents in the process. Depending on the predominant usage patterns in the target system, you might want to cut large documents into smaller XML fragments, extract only specific sections from each XML document, or rename XML elements. The source as well as the target of such transformations can be XML files in the file system or tables (with XML columns) in a DB2 database.
You can certainly code many useful transformations in SQL/XML, but you might prefer to use a visual tool to design ETL data flows. One such tool is the IBM InfoSphere Warehouse Design Studio.
A new article on developerworks describes how to use the InfoSphere Design Studio to design ETL flows that involve XML data. As a concrete example, the article walks you through the steps of building a data flow that reads from an XML source table and populates two target tables in a data warehouse. One of the target tables contains only relational data, while the other contains both relational and XML data. Read the article here:
This article is a two-part series and I’ll let you know when part 2 is published.