DB2 Compresses XML Data by 60% to 80%
May 6, 2009
This continues my series of posts about the new features for working with native XML data in the IBM DB2 database software.
Compression reduces the amount of storage space needed for data. Data storage costs money, so minimizing this cost is very important for many organizations. Especially when storage costs can be reduced by 60% to 80%. Storage-related costs include the actual storage devices themselves, the power consumed by those storage devices, and the time spent maintaining these devices.
Another benefit of data compression is that it often improves database performance. Because the data requires less disk space, you typically have reduced levels of disk I/O activity, which can improve database performance. Also, because more data is being cached, you may also enjoy improved buffer pool hit ratios. In many cases, the performance gain due to reduced I/O and better memory utilization outweighs the extra CPU cycles required to compress and decompress the data.
When storing XML data, DB2 typically places the XML data in a location called the XML Data Area (XDA). However, if the XML data is less than 32KB in size, it can be stored with the relational data (this is called inlining).
With DB2 9.5, you can compress XML data that is inlined, allowing you to reduce storage for XML data. For instance, the XML transactions in the TPoX benchmark are typically smaller than 32k, allowing them to be inlined and compressed. In the most recent TPoX benchmark, one terabyte of raw XML data is stored in 390 gigabytes of storage, giving a compression ratio of 61%.
DB2 9.7 extends compression to all XML data, regardless of whether it is in the XDA or inlined. In other words, DB2 9.7 can compress XML data, regardless of size. (The maximum size of an individual piece of XML data that can be stored in DB2 is 2 gigabytes.)
The degree to which XML data can be compressed depends on the nature of the XML data. IBM has tested the new data compression features with six different data sets. Three of these data sets were supplied by IBM clients, and represent real world client usage. The other three data sets represent XML data sets available in the public domain. The data sets include XML documents that range in size from 2KB to 100MB. The following diagram shows the storage savings that have been achieved (this diagram is from Cindy Saracco and Matthias Nicola’s article titled Enhance business insight and scalability of XML data with new DB2 V9.7 pureXML features).
As you can see, compressing XML data typically results in 60 to 80 percent disk space savings with DB2 9.7.
Finally, I’d also like to mention that if you compress XML data, you can also compresses any indexes for that XML data. Compressed indexes also reduce physical I/O and increase buffer pool hit ratios, which often leads to a net performance gain.