DB2 pureXML – Rich in Proteins!
September 12, 2011
How much protein is in a steak? Well, a 6-ounce steak can contain about 38 grams worth of protein.
How much protein does salmon contain? Approximately 34 gram of protein in a 6 ounce piece of salmon.
How much protein can you find in lentils? A cup of cooked lentils has 18 rams of protein.
And how much protein can you find in DB2 pureXML? Tons! Hundreds of Gigabytes and soon over a 1TB of proteins!
It’s true, DB2 pureXML is an excellent source for proteins. DB2 can store, compress, and index the proteins so that you always find the right ones quickly when you need some.
What distinguishes DB2 from a steak is that DB2 stores all proteins in XML or in a hybrid XML/relational format! This enables members of the biological research community to analyze and compare the structure and composition of protein molecules. The findings can help them explain diseases, develop new drugs, or understand the interactions between different proteins.
The protein data is publicly available from the Protein Data Bank (PDB) which is world-wide repository of structural protein data. To facilitate data exchange and flexibility, the data is available in XML.
Typically one XML document describes a single protein molecule. Such a document includes detailed information about all the atoms that the protein consists of, which can be hundreds or sometimes thousands of atoms. And to represent the structure of the protein, the 3-dimensional spatial coordinates of each atom are included as well, resulting in XML documents that can be hundreds of MB in size to describe a single protein.
Searching and analyzing such amounts of complex information is a challenge. To tackle this challenge, researchers in Germany decided to harness the hybrid XML/relational capabilities of DB2 to efficiently store and query the protein data. I was happy to assist them to get the most out of DB2 pureXML.
The article “Managing the Protein Data Bank with DB2 pureXML” describes the database design and optimization that facilitate protein data analysis in a scalable manner. Even if you’re not a biologist (I’m not!), the article is an interesting cases study in how a real-world data management problem has been mapped to the hybrid features of the DB2 database system.