I recently described how vertical solutions are the best way to bring emerging technology to market. Today, I provide some guidelines for determining vertical solution areas. See the presentation embedded below for details (if you subscribe to email updates, you need to view the post in a browser to see the presentation).

Because the criteria for evaluating vertical solution areas depend on a vendor’s development capabilities and routes to market, I cannot propose a definitive list of vertical solutions. If you are interested in doing so, I will leave it as an exercise to you…

Storebrand Group is Norway’s oldest and one of its biggest financial services companies, and a leading player throughout Scandinavia. The company provides life insurance, pension products, commercial retail banking, and asset management to many of Norway’s largest companies, as well as to private individuals, municipalities and public sector entities. I thought it would be good to spend a few moments to dig deeper into some of their performance improvements:
  • Order processing time has been reduced. For example, an application for a license to implement a pension plan previously took up to three weeks. It can now be completed in 10 minutes.
  • Faster processing gives Storebrand the ability to handle five times the number of customer orders. Much of the manual data re-entry done by individual departments has also been eliminated, leading to fewer mistakes, higher quality and more efficient customer service.
  • In the past, when storing XML in a CLOB, certain reports took between 24-36 hours to run.  Now with DB2 pureXML, these same queries take between 20 seconds to 10 minutes to run.
  • The time it took programmers to prepare for a search process shortened from one week to one half day.
  • In the past, programming the search processes took 2 hours with shredding and 8 hours with CLOB-stored data. Now with DB2 pureXML, these same programming tasks take 30 minutes.
  • Updating the schema is also much quicker. Adding a field to the shredded data took a day of work (development and test), but a week of real time because of the processes involved with database changes. For DB2 pureXML all that is needed is to change the pointer to the schema in an XML configuration file, which takes 5 minutes.
  • Storebrand also achieved a 65 percent reduction in the amount of I/O code by converting 20 of its services to pureXML.

Click on the image below to link to a very short video about the value a Norwegian financial services company called Storebrand is realizing by switching to a native XML database. They now have shorter development cycles and more efficient reporting cycles. But most important of all, they are leveraging this technology to realize business benefits like reducing order processing time.

Often when someone brings new technology to market, they try to sell the technology. This is often the wrong thing to do. See the presentation embedded below for details (if you subscribe to email updates, you need to view the post in a browser to see the presentation).

So what does this mean for native XML database technology? Well, first of all, it is not database administrators who will drive the adoption of native XML databases. And it is not application developers and application architects who will drive the adoption of native XML databases. It is Line Of Business (LOB) executives who can leverage this technology for competitive advantage who will drive its adoption. In a few days, I will describe how to determine the LOB solutions that native XML databases need to initially target.

I recently discovered an excellent treatise on XML and Databases by Ronald Bourret. It is certainly worth reading. Mr. Bourret describes the various options for storing XML in databases. He claims that the most important factor in choosing a database is whether the information is data-centric or document-centric. I will take some liberties and paraphrase his descriptions:

  • Data-centric information uses XML solely as a data transport mechanism. It is not important that the data is, for some length of time, in XML format. Examples are sales orders, flight schedules, scientific data, and stock quotes. For such cases, he recommends using a traditional database, such as a relational, object-oriented, or hierarchical database.
  • For document-centric information, XML fulfills a greater role. In this case, the XML elements, attributes, and structure are meaningful and have value. Examples are books, email, advertisements, and hand-written XHTML documents. For such cases, he recommends using a native XML database or a content management system.

He does say that these rules are not absolute, without going into further detail. Let me describe some reasons why you may want to store data-centric information in a native XML database:

  • Performance improvements. See my previous post for details regarding the performance for queries.
  • Greater business agility. Respond quickly to dynamic conditions by easily accommodating changes to data and schemas.
  • Lower development costs. Reduced code and development complexities lead to shorter development cycles when updating your systems or adding new applications.
  • Improved business insight. Gain competitive advantage through better and quicker access to business insights.
  • Space savings. Databases like DB2 use inlining and compression to realize between 3x and 6x space savings when storing XML data.

I will go into more detail on some of these reasons in subsequent posts.

Matthias Nicola and Vitor Rodrigues wrote an excellent paper comparing the performance of IBM’s native XML storage (called pureXML) to non-native storage. It makes for very interesting reading. I will include some highlights here.

Traditionally, there were two approaches to storing XML data in a relational database: using a CLOB or shredding.

If XML data is stored in a character large object (CLOB) field, the data is typically inserted as unparsed text. This avoids XML parsing at insert time, however it requires XML parsing at query execution time. This leads to low search and extract performance. For instance, look at the following graph (lower numbers are better).

This table shows the results for five types of queries:

  • Select*, which is full document retrieval of all documents, no predicate
  • 1Pred1Doc, which is full document retrieval of one document matching one predicate
  • 5PredSome, which is full document retrieval of documents matching multiple predicates
  • PartialAll, which is partial retrieval of all documents
  • PartialSome, which is partial retrieval of all documents matching certain criteria

For most queries, native XML storage significantly outperforms CLOB storage. Only full document retrieval that ignores the XML structure quickly reads XML data from CLOB fields.

With shredding, the XML data is mapped to a relational structure (which is then stored in a relational database). Here you can see that XQuery (which is used for the native XML) outperforms SQL querying with XML conversion (which is used for shredding) for most types of query. However, because searching over relational data is faster, the query that retrieves part of the data across all records is faster for shredded data.

These findings show that, in most circumstances, native XML storage provides significant performance gains over CLOB storage and shredding. However, I show here only part of the story. For instance, I did not include performance information for the ingestion of data. I encourage you to read the full paper at A performance comparison of DB2 9 pureXML and CLOB or shredded XML storage.


March 19, 2008

Welcome to my XML database blog. In this blog, I will describe the use and adoption of XML databases, concentrating on native XML databases and hybrid XML databases.

Until recently, databases had limited support for XML data. For instance, when putting XML data in a relational database, one typically had to choose between storing it as a character large object (CLOB) or “shredding” the data into smaller pieces and storing them in individual fields. This situation was far from ideal, as both approaches imposed significant limitations on solutions that use the XML data.

To address these shortcomings, a number of databases have emerged with native support for XML data. These databases use XML as a fundamental unit of storage. They also support syntaxes like XPath or XQuery for working with the XML data. This blog is dedicated to the use of native XML databases and related solutions.