XML in Oracle 11g

September 30, 2008

Here are some observations about the XML storage capabilities in Oracle 11g. These observations were deduced from public sources. Please be aware that I work for IBM who competes directly with Oracle in this regard.

Oracle provides three options for storing XML data:

  • Unstructured, which is essentially Character Large OBject (CLOB) storage. Like any CLOB implementation, you will need to retrieve and parse the XML data before executing XPath and XQuery statements, which has a query-time performance impact.
  • XML-Object-Relational, which shreds the XML data into object-relational tables. There are multiple storage options to choose from (which I will cover in a moment). Oracle recommends this option for data-centric use cases. Of course, with this option, retrieving the original XML data will incur a performance hit as the data is re-composed.
  • XML-Binary, which stores the XML data as a token stream in a Binary Large OBject (BLOB). Oracle recommends this option for document-centric use cases.

If you use XML-Object-Relational, which is also known as Structured or Schema-based storage, you have five different options to choose from for storing repeating elements:

  • Store in CLOBs
  • Store in VARRAY as LOBs
  • Store in VARRAY as nested tables
  • Store in VARRAY as XMLType pointers to BLOBs
  • Store in VARRAY as XMLType pointers to nested tables

If your head is spinning with the different options, I don’t blame you.

Oracle provides a special index called XMLindex, which indexes the internal structure of XML data. Actually, this index is a table. There is an interesting post on the Oracle Discussion Forum. In this post, an Oracle user describes their experiments with XBRL data in Oracle 10g and 11g. In their experiments, case 3 runs on Oracle 11g without XMLIndex in 9.6 seconds, while case 4 runs with XMLIndex and takes 574 seconds. So, in this user’s experience, running with the XMLIndex results in approximately a 50x slowdown.

Note that if you want to create an index for numeric or date values in Oracle 11g, you must use stored procedures, which create separate indexes that you must asynchronously maintain.

Oracle now has an XML update function. It is not compliant with the W3C XQuery Update Facility. Although, Oracle did recently announce the XQuilla XQuery engine, which they claim “will (maybe) free the way for the W3C XQuery Update Facility 1.0 candidate specification / implementation, which is embedded in the XQuilla XQuery engine, for other Oracle products”. So it is possible that a future release of Oracle may support the W3C XQuery Update Facility.

The compendium of XML storage options in Oracle 11g are, in my opinion, essentially based upon existing relational infrastructure. This is very different than the approach that IBM has taken, where they have truly built native support for XML data into their database from the ground up.

Please note that these are solely my personal opinions and not necessarily those of my employer IBM.

About these ads

2 Responses to “XML in Oracle 11g”

  1. Mark Says:

    Conor

    This post is disappointing..

    1. The OTN Post you refer to is over 12 months old.

    2. The post shows exactly the wrong way to deal with XBRL. If you’d read and understood the use cases it is clear that XBRL is highly structured XML so using an index designed to optimize queries on unstructured is, by definition, incorrect practice.

    3. Non of our XML indexes require asynchronous maintenance or our asynchronous by default. We offer the option of asynchronous indexes for certain uses cases where it is appropriate to provide our users with maximum flexibility.

    4. In the 8 years I’ve been dealing with XML the one thing I’ve learnt is that one size does not fit all. Oracle is the only vendor to offer flexible storage models that allow you to pick a storage model that will meet the needs of the application

    Thanks

    Mark


  2. Hi Mark,

    Thanks for your comments and clarifications. I appreciate your input.

    One possible interpretation of the discussion forum post is that people often struggle to figure out the best approach when presented with multiple options.

    Talking in general terms for a moment… I know that I, in past jobs, have struggled with product design trade offs between configurability and ease-of-use. Its always an interesting challenge. But I’m digressing.

    XML storage is indeed an interesting challenge. I hope that some day we have a set of industry benchmarks that we can use to accurately compare performance. The challenge will be to devise a set of benchmarks that covers the wide range of XML use cases, and to have the vendors endorse them.

    Of course, if someone is making a product decision, they should never base that decision on an isolated data point like the one mentioned above. After all, it is highly unlikely that any one data point, regardless of the details, will match someone’s environment.

    Cheers,
    Conor.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: