Quick Start for Persisting XML Standards-Compliant Data
August 18, 2008
XML-based standards have emerged in many industries. For instance, there is ACORD in insurance, FIXML in financial services, NIEM in government, and so on.
Are you evaluating options for persisting standards-compliant XML data? If so, you should know about a great resource. As you know, you can freely download IBM DB2, which is a data server for both relational and XML data. Well, IBM has also made available working demos for a number of XML standards, including ACORD, FIXML, FpML, MISMO, NIEM, OTA, TAX1120, TWIST, UNIFI, and more. The demos show end-to-end XML data exchange, together with data retrieval via RESTful Web services, Atom feeds, and XForms.
You can see the demos for yourself at:
http://services.alphaworks.ibm.com/DB2pureXMLDemo/Demo.html
And you can download sample data and demo scripts at:
http://www.alphaworks.ibm.com/tech/purexml/download
Webcast: A primer for storing and retrieving XML data
August 7, 2008
If you are looking for an introduction to the topics of storing and retrieving XML data, this free on-demand webcast is hopefully a good place to start:
On demand Webcast: A primer for storing and retrieving XML data
Native XML Storage Reduces Development Costs
August 5, 2008
I have mentioned quite a few times how native XML storage makes life easier for database administrators and application developers. I thought I’d take a few moments to go into a little more detail.
By storing the XML directly in the database, you avoid the need to shred the XML into a relational schema. Shredding is also know as decomposition. Before shredding, you need to design a relational schema for the data. This can quite often be a laborious process. Sometimes this can be automated with off-the-shelf tools. However, you should keep in mind that the resulting tables will almost certainly need to be carefully examined, and possibly optimized. After designing the relational schema, you then need to set up the environment that actually maps the XML to the relational schema. And finally, you will need to develop and test code for using the data, which is typically quite complex because of the need for unwieldy SQL statements with multiple JOIN statements.
It sounds like there is a significant overhead for shredding XML data into a relational schema. But this is only a part of the story. You also need to consider what happens when the XML schema changes. And XML schema changes are an unfortunate reality for many of us. When the schema changes, it can play havoc with your relational schema, your mapping process, and the code for your applications that use the data. It is in dealing with these updates that many organizations are realizing the greatest gains from adopting native XML storage.
If you want a real-world example of the potential impact of dealing with relational tables for XML data, consider the FpML industry standard. With native XML storage, dealing with FpML messages is as straightforward as storing and retrieving the message. However, if you use shredding to store FpML messages, in some implementations you need to work with more than 475 separate database tables.
Some organizations have measured the impact of native XML storage on the workloads of database administrators and application developers. Here are some of the findings that I am aware of:
- A large European financial services company use DB2 pureXML (pureXML is IBM DB2’s native XML storage feature) to reduce the number of lines of code for writing to and reading from their database by 65%. This reduces the amount of code they must develop, test, and maintain and allows their developers to be more productive.
- This same company reduced the amount of time needed to develop services by 75%, thus allowing them to take on additional development projects and improve the services they provide.
- They also determined that schema changes are now a breeze. In the past with shredding, adding a field took a day of work (development and test), and a week of real time because of the processes involved with database changes. Now all that is needed is to change the pointer to the schema in a DB2 XML configuration file, which takes 5 minutes.
- A leading chemical company determined that by using the IBM database tools and pureXML technology they are saving 50-75% on development costs, while improving developer productivity by 25-50%.
- And finally, a major American financial services company studied their environment and determined that by using the native XML storage in DB2 pureXML they reduce their resource requirements by 30%.
In any business, these are improvements that cannot be ignored.
When to Store in XML Format
August 4, 2008
XML is unchallenged as a data exchange format. Most industries have developed standards that use XML as the data format. The list of standards is seemingly endless: ACORD, FIXML, FpML, MISMO, HL7, NIEM, OTA, SVG, TAX1120, UBL, XBRL, and so on.
Your immediate inclination is likely to store such information in its native XML format. After all, when the information is exchanged, it is expressed in XML format. Therefore it seems natural that it would also be stored in a native XML format. Often this is the case. However, it is not always the case. Sometimes it makes more sense to store the information in relational form. This blog post discusses when to use the two approaches.
The key question that you need to ask yourself is whether the information is best expressed in a relational format or in an XML format. Because XML is so popular as a platform-neutral and transparent data exchange format, data is often sent as XML even when XML is not the ideal format for the information. For instance, information is sometimes force-fitted into a hierarchical data structure even when it has no hierarchical characteristics. If this is the case, examine the data to see if it should in fact be stored in a relational format. The general rule is that you should consider a relational structure when the data is better described in tabular format. This is the characteristic that most readily defines the relational schema.
There are also situations when you may be compelled to use a relational schema. For example, if the application that consume this information work only with relational data, then you need to take this into consideration.
You should store information in XML format when:
- The information is best represented in a hierarchal format.
When you store hierarchical information in a relational database, you often require a large number of tables. Creating and maintaining such systems is complex and costly. Multiple joins result in complicated SQL statements. In such cases, you should instead use native XML storage.
- The schema is subject to change.
Because relational tables are highly-structured, changing the schema is both difficult and costly. Updating a schema for natively stored XML, on the other hand, is a relatively painless exercise. If you expect the schema to change, you should use native XML storage.
- There are many null values in the relational tables.
Relational database tables set aside storage for each column. If your information does not populate all of these columns, you can have a large number of null values in your database. This, in turn, affects both storage requirements and performance. Again, you should consider native XML storage in such situations.
Getting Started with Native XML Databases
July 27, 2008
Many people come here while searching for the best native XML database. Being employed by a vendor with a leading native XML database, I cannot make impartial judgements in this regard. So I’m not going to try. I can, however, make sure you you know how easy it is to evaluate my employer’s native XML capabilities, so you can make up your own mind.
Not only does IBM offer a production-quality native XML database for no charge. It also provides publication-quality books about it for free. This makes it very easy for you to get started with XML databases.
DB2 9 Express-C has no data storage limits. You can store as much data as you like in the database. DB2 9 Express-C has no evaluation time limit. You can use it as long as you like. The only limits are that you use a server with a maximum of 2 cores and 2GB of RAM. Download it from the DB2 Express-C Web page.
IBM has published two books about native XML storage. These books are available for purchase. However, you can also freely view these books as HTML or download the PDF version for offline viewing and printing. Click on images below to see these books:
![]() |
![]() |
|
| DB2 9 pureXML Guide | DB2 9: pureXML Overview and Fast Start |
Good luck with your evaluations.
XQuery versus SQL/XML, Part Deux
July 25, 2008
Some of you have been asking for more information about XQuery versus SQL/XML. In particular, it appears that you are interested in understanding the levels of support for common operations. I’ll take a few moments to compare both XQuery and SQL/XML in IBM DB2 9. However, please note that not all vendors provide the same levels of support. For instance, all vendors do not support sub-document update, and those that do support it may not implement the XQuery standard. So please, before making any decisions, verify the levels of support provided by your vendor.
| Operation | XQuery | SQL/XML | Comments |
| Inserting an XML document | No | Yes | You use SQL to insert an entire XML document. |
| Retrieving an XML document | Yes | Yes | |
| Retrieving part of an XML document | Yes | Yes | |
| Using predicates with relational data | Yes | Yes | XQuery does not support relational predicates. However, IBM DB2 supports SQL in XQuery, allowing predicates with relational data. |
| Using predicates with XML data | Yes | Yes | |
| Deleting an XML document | No | Yes | You use SQL to delete an entire XML document. |
| Updating an XML document | Yes | Yes | |
| Updating part of an XML document | Yes | Yes | |
| Joining XML data | Yes | Yes | Using XQuery is the easier approach. Using SQL/XML is typically difficult to code. |
| Joining XML wth relational data | Yes | Yes | XQuery does not support joins to relational data. However, IBM DB2 supports SQL in XQuery, allowing joins to relational data. |
| Transforming XML | Yes | Yes | Using XQuery is the easier approach. Using SQL/XML is typically difficult to code. |
| Aggregating XML data | Yes | Yes | Using SQL/XML is the easier approach. Using XQuery is possible with embedded SQL, but is typically difficult to code. |
| Calling external functions | No | Yes | |
| Passing parameter markers | No | Yes |
At first glance, it may appear that SQL/XML has more extensive support. However, this is in part because logically-speaking certain tasks do not belong in XQuery. Also note that some tasks are easier to code with XQuery. This ease of coding can make a significant difference in some environments.
Questions for XML Database Vendors
July 11, 2008
Are you evaluating XML database vendors? If so, here is a list of questions that can help you when you evaluate vendors. Of course, some questions may not apply to your situation. For instance, update capabilities may not be necessary in audit and logging systems. You can weed questions out that do not apply to you.
Performance:
- Ask about the performance when inserting XML data into the repository. I came across one customer who unfortunately went with a vendor that could not keep up with with their database ingest needs and had to switch to IBM. XML query performance alone is often not a sufficient measure. In some systems you may have to use heavy indexing for good query performance, but these indexes then lead to significant overhead for insert, update, and delete. In fact, you should also probably ask about the overhead incurred when working with indexes.
- Ask about the query performance. Different types of queries have vastly different characteristics, so make sure that the performance proof points they give match your situation. For instance, are the performance proof points using the same kind of data you use, are they at the same granularity as your typical queries, and are they working across a similar data set to yours.
- Ask if their XML performance proof points are publicly available with sufficient detail about the test data, workload, hardware used, etc to verify their claims.
- Ask about any restrictions there are on their XML indexes. Can all data types be indexed?
- Ask about the scalability limits for the database.
- Ask for proof points on the reliability of the database.
Schema Support
- If you work with multiple XML schema, ask about their schema handling capabilities. For instance, do you need it to support multiple schema? Or do you need it to support different versions of the same schema in the same column? It is important to clarify these requirements up-front.
- Schema updates are inevitable. Ask what is involved when you have new versions of schema. Is it a seamless experience, or does it require a significant migration effort?
- In fact, I would recommend validating that the vendor can work with your schema up-front, especially if it is complex. I have heard of situations where people have had issues with other vendors in this regard.
- Or perhaps you have schema-less documents. If so, do all of their features support such documents?
Language Support
- If you plan to use SQL, make sure that their SQL/XML function can meet your needs.
- Similarly, if you plan to use XQuery, make sure their XQuery implementation can meet your needs.
- Confirm whether XQuery is embedded in SQL, or whether XQuery can be used standalone and via an API. This may be important to you.
- Are XML updates important to you? If so, ask about their support for update capabilities. And ask if there are any limitations in their update support. I understand that certain vendors have limitations in this regard. In particular, you will want to ask if they support the XQuery Update Facility that is being standardized by the W3C.
Miscellaneous
- Ask if there is a no-charge version of the database for pilot projects. XML features can look wonderful on paper but may be more difficult to use than you expect. Limitations in functionality and usability are not obvious from documentation and white papers, but are revealed when start doing hands-on work with your own XML data.
- If you need your reports and applications to also work with legacy relational data, ask how easy it is to work with XML and legacy relational data.
- Do you need to work with digital signatures? If so confirm that the digital signatures can be validated against retrieved documents.
- Ask to what extent they are standards-compliant.
- If high availability is important, ask if they offer high availability features.
Thanks to Matthias Nicola with his help in compiling this list. And good luck with your selection process. Cheers!
Viral Video - Wednesday
July 1, 2008
And now, the third installment. I hope you enjoy…
XQuery versus SQL/XML
June 26, 2008
XQuery and SQL/XML are two standards-based languages for retrieving information from XML. Many XML storage vendors support for both standards. Although, as is typical for standards implementation, those vendors have varying degrees of support for the standards.
Recently, some people asked me whether XQuery or SQL/XML will win the XML retrieval wars. This question surprised me. You see, I believe there is a valid need for products to support both.
XQuery is a W3C Recommendation. It is supported by vendors like IBM, Oracle, and Microsoft. The language includes features like variables, data types, operators, conditional expressions, and functions. It uses XPath expressions to select information from XML. So XQuery is, in essence, a new language for many people to learn.
SQL/XML, on the other hand, is a set of extensions to SQL. It consists of the XML data type, a collection of XML publishing functions, conversion functions, schema validation functions, and more. SQL/XML was developed by INCITS H2.3, with participation from Oracle, IBM, Microsoft, and others. So SQL/XML is, in essence, an extension to an existing language.
There is a valid need for both.
Some developers are already comfortable with XML development. These developers will easily adapt to XQuery. Also, in many circumstances, XQuery offers developers a strong combination of programming power and ease of use. Finally, XQuery offers strong performance for many XML tasks. Although, each implementation of XQuery and its accompanying database are different, so please verify your vendor’s performance in this regard.
SQL/XML, on the other hand, is ideal for environments where developers are comfortable with SQL programming. When you consider the maturity of the SQL language, together with the strong API support and domain knowledge for query optimization, you realize that SQL/XML is ideal for certain environments. It may take more effort to code certain XML tasks in SQL/XML, but that may be acceptable in some environments.
So I believe that certain environments will favor one language over the other, that there is no reason why both languages can’t exist, and that it is good to allow people to choose the language that suits them best. What do you think?
Viral Video - Tuesday
June 20, 2008
Here is part 2. Enjoy…



My name is Conor O'Mahony. I lead XML product strategy for Data Management at IBM. These postings are my opinions and do not necessarily represent IBM’s positions, strategies, or opinions.