XML ETL with DataStage 8.5

April 13, 2011

Since XML has become so pervasive in enterprise computing and service-oriented architectures, it is a given that XML capabilities are required in all parts of an IT environment. Closely related to the area of databases is ETL -  to extract, transform, and load information, e.g. to populate data warehouses or to integrate and connect disparate systems.

The good news is that version 8.5 of IBM InfoSphere DataStage has greatly enhanced XML capabilities. I first saw the new XML features in DataStage 8.5 in demo at the Information On Demand conference last October. And I was very impressed because the XML support goes far beyond the half-hearted XML handling that many tools offer.

For example, it’s easy to import and work with XML Schemas in DataStage 8.5. Many industry standard XML Schemas that are used in the financial sector, health care, insurance, government, retail, etc. are quite complex and consist of dozens or even hundreds of XSD files that comprise a single XML Schema. Examples include FpML, FIXML, HL7, IRS1120, OAGIS, and many others.

You might receive such a schema as a ZIP file that contains multiple folders with XSD files. DataStage 8.5 can simply read the entire ZIP file, which saves you the tedious job of importing all the XSD files separately or dealing with their import and include relationships.

Once the XML Schema is imported, DataStage understands the structure of your XML document and allows you define the transformations that you need.

The XML transformation capabilities certainly include some of the intuitive things. For example, you can:

  • compose new XML documents from relational tables or other sources
  • shred XML documents to a set of relational row sets (or tables)
  • extract selected pieces from each XML document and leave other parts of the XML unparsed and “as-is”
  • extend your XML processing by applying XSLT stylesheets to the incoming XML data

And there is also a powerful set of transformation steps that allow you to implement any custom XML transformation that you may need.

A big bonus is the ability to process even very large XML files very efficiently. When batches of XML documents are moved between systems, they are sometimes concatenated into a single large XML document that can be GBs in size. Such mega-documents typically contain many independent business objects and often need to be split by the consumer. DataStage 8.5 handles documents of 20GB or larger very efficiently and can even parse and process a single large document with multiple threads in parallel! This is very cool and a big win for performance.

There is a nice 2-part article on developerWorks that describes these XML capabilities in more detail:

http://www.ibm.com/developerworks/data/library/techarticle/dm-1103datastages/index.html

http://www.ibm.com/developerworks/data/library/techarticle/dm-1103datastages2/index.html

And the documentation of the “XML stage” starts here:

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/index.jsp?topic=/com.ibm.swg.im.iis.ds.stages.xml.usage.doc/topics/introduction_to_xml_stage.html

About these ads

2 Responses to “XML ETL with DataStage 8.5”

  1. dinakaran Says:

    Hi,

    As mentioned in your post that the FpML can be processed by datastage 8.5 version. we are trying to load FpML schema onto the schema library and when the schema is used in the jobs, the datastage does not respond and it is hanged.

    Can you please give some idea regardin on how to process the FPML schema in datastage.


    • Hi,
      for starters, please also submit this question in the DataStage discussion forum at http://www.ibm.com/developerworks/forums/forum.jspa?forumID=825. That’s where DataStage experts can help you.
      Also, I think that more information is required. Which version of the FpML schema are you using? And can you describe the “jobs” in which DataStage hangs? Have you tried to simplify your job, or cut it into multiple separate jobs in order to narrow down the specific part where the problem occurs?

      Thanks,

      Mathias


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 51 other followers

%d bloggers like this: