August 30, 2010
Many enterprises hold significant amounts of relational data in their databases. New applications and integration scenarios often require this data to be communicated in the form of XML messages, e.g. in service-oriented architectures (SOA), web services, application integration (A2A), or data exchange with internal and external parties (B2B). Fortunately, the SQL language contains a set of functions that let you convert relational data into XML, right in your SQL queries that read the required data. This is easier and more efficient than writing complex application code.
Some of the most important SQL/XML construction functions include:
- XMLELEMENT, to construct an XML element from a relational value
- XMLATTRIBUTES, to add attributes to a constructed element
- XMLFOREST, to construct a sequence of elements
- XMLNAMESPACES, to add a namespace to an element
- XMLAGG, to aggregate multiple XML elements that stem from separate relational rows into a single block of XML
In many cases it is particularly convenient to encapsulate XML construction in a function or a view that applications can simply use in their SQL statements without having to be concerned with the details of the XML construction itself.
Let’s look at a few simple examples. First, assume that we have the following sample table with 3 rows of data:
CREATE TABLE employee(firstname varchar(30), lastname varchar(30), dept varchar(5)); INSERT INTO employee(firstname, lastname, dept) VALUES ('John', 'Doe', 'A01'), ('Mary', 'Jones', 'B02'), ('Peter', 'Pan', 'A01');
In a real application, each employee record would have many more columns but for simplicity let’s just work with these 3 columns f0r now. Assume that we need to retrieve each employee record in a specific XML format with a root element called “emp”, an attribute for the department numbers, and two child elements “first” and “last” for the first name and last, respectively. We can define the following user-defined scalar function. This function takes several relational column values as input returns a result of type XML.
CREATE FUNCTION emplxml(first VARCHAR(30), last VARCHAR(30), dept VARCHAR(5)) RETURNS XML RETURN XMLELEMENT(name "emp", XMLATTRIBUTES(dept AS "dept"), XMLFOREST(first, last) ) ;
This user-defined function (UDF) performs the desired XML construction for a given employee record. An application can then use the UDF to easily retrieve the name and department information in XML format:
SELECT emplxml(firstname, lastname, dept) FROM employee; <emp dept="A01"><FIRST>John</FIRST><LAST>Doe</LAST></emp> <emp dept="B02"><FIRST>Mary</FIRST><LAST>Jones</LAST></emp> <emp dept="A01"><FIRST>Peter</FIRST><LAST>Pan</LAST></emp> 3 record(s) selected.
Now let’s assume we need one XML document per department instead of one XML per employee. That means that we need to group the employee rows by department and aggregate the rows for each department into a single XML document. We can probably do that in a table function, but it can also be done conveniently in a view:
CREATE VIEW deptxmlview(deptID, deptdoc) AS SELECT dept, XMLELEMENT(NAME "Department", XMLATTRIBUTES (dept AS "name" ), XMLAGG( XMLELEMENT(NAME "emp", lastname) ) ) FROM employee GROUP BY dept;
Again, an application does not need to be concerned with details of the XML aggregation and construction. It just needs to know that it can read one XML document per department from the view:
SELECT deptdoc FROM deptxmlview WHERE deptID = 'A01' or deptID = 'B02'; <Department name="A01"><emp>Doe</emp><emp>Pan</emp></Department> <Department name="B02"><emp>Jones</emp></Department> 2 record(s) selected.
If you need to build more complex XML documents from your relational tables, you can certainly join multiple tables and apply the SQL/XML construction functions to columns produced by the join. You can also encapsulate pieces of XML construction in views and UDFs and use them as building blocks to construct bigger pieces of XML in a modular fashion.
August 22, 2010
I have been asked many times: “What is faster, XML or Relational?”. Of course, this question oversimplifies a complex issue, and so the only valid answer is “It depends!”. Sometimes people ask the same question in a slightly different way: “If I have a relational table and convert each row into a small XML document and store these documents in a separate table with 1 XML column, what’s the performance difference between the two tables (for inserts/updates/queries)?”. But, in most cases such a conversion is not recommended and this type of comparison is, again, too simplistic.
Let’s say you want store, index, and query 1 million addresses and each address has a first name, last name, street, city, state, and zip code. That’s a simple and fixed structure and it’s the same for all records. It’s a perfect fit for a relational database and can be stored in a single table. Relational databases have been optimized for decades to handle such fixed records very efficiently. However, if the application needs to convert the address data to XML format anyway, it can often be faster to store the data permanently in XML and avoid the repeated conversion from relational format to XML format in the application.
Now consider a scenario where the business objects of interest are a lot more complex and variable than simple addresses. For example, derivative trades in the financial industry are modeled in XML with an XML Schema called FpML (financial products markup language). It defines more than 7000 fields (many are optional) with hundreds of 1-to-many relationships between them. Designing a relational schema to represent such objects is very hard and leads to hundreds of tables. The process of inserting (normalizing) and reading (denormalizing) a single object into such a relational schema can easily be 10x or 100x slower than inserting and reading a corresponding XML document in a native XML column (e.g. in DB2 pureXML).
So, any performance comparison of XML versus relational depends heavily on which data you choose for the comparison, and what type of operations you measure.
A large DB2 customer recently compared XML to relational performance because their business objects are currently mapped to 12 relational tables. Their application executes at least 12 SQL statements to retrieve all the relational data that comprises one of the logical business objects. Then the application reassembles the original business object. An alternative is to store these business objects as XML so that each object is stored as a single document. Instead of 12 tables, only 1 table with 1 XML column is then needed. In multi-user tests for data retrieval the the company found that the XML-based solution allows them to retrieve objects with 55% higher throughput than the existing relational SQL-based solution. The reasons for the performance benefit include fewer JDBC interactions between application and database as well as fewer distinct pages that need to be read when one logical business object is represented as one XML document (and not scattered over 12 tables). These tests were later repeated and verified at an IBM lab.
Another important considertion in the question of XML versus relational is the data format in which the data is produced and consumed outside the database. If the data is produced and/or consumed in XML format anyway, it is often better to also store the data as XML in the database.
So, the question “What is faster, XML or Relational?” is somewhat like asking “What is faster, a truck or a ship?”, because XML and relational are meant for different purposes, and either one can outperform the other depending on which use case you look at. And there are also use cases (with high schema complexity and schema variability over time) that cannot reasonably be implemented in a relational data model. (If you need to go to Hawaii, the boat always beats the truck!).
The beauty of a hybrid database system such as DB2 is that you can use both native XML and relational capabilitues side by side, in a tightly integrated manner. Some data is better representated in relational tables, other data is better represented in XML, and you can manage both in the same database or even the same table, and with the same APIs and utilities.
August 11, 2010
How to order query results based on XML values…. this seems like a simple question, doesn’t it? It’s indeed not hard, but there are a few useful things to know about it.
Assume we have a set of XML documents that describe “employee” information, and we want to return the employee names (/emp/name) ordered by the employees’ ID numbers (/emp/@id).
This is very easy to write in XQuery:
for $i in db2-fn:xmlcolumn("EMPLOYEE.DOC")/emp order by $i/@id return $i/name/text()
In this example the employee documents are stored in the XML column “DOC” in the table “EMPLOYEE”. The order by clause reorders the sequence of “emp” elements based on the @id attribute, and that’s the order in which the names. At the end of the order by clause you can add the keyword ascending or descending to choose a sort order.
Now, let’s assume we are querying a mix of XML and relational data, which is a common requirement, and want to do the same ordering in a SQL/XML query. As a first (but incorrect) attempt we might write the following query:
SELECT XMLQUERY('$i/emp/name/text()' PASSING doc AS "i") FROM employee ORDER BY XMLQUERY('$i/emp/@id' PASSING doc AS "i"); SQL20353N An operation involving comparison cannot use operand "Ordering column 1" defined as data type "XML". SQLSTATE=42818
The error message SQL20353N produced by DB2 tells us that we cannot order on a column of type XML. In this example, the XML column that we try sort by is produced by the XMLQUERY function in the SQL order by clause. The XMLQUERY function always produces values of data type XML.
The SQL/XML standard defines that values of type XML cannot be compared to determine whether one is greater than the other. This restriction makes sense, because an individual value of type XML can be a full document, a document fragment, a sequences of multiple values, or a single atomic value. In SQL/XML there is no well-defined way to decide, for example, whether one XML document is greater than another, or whether an XML document is greater than 5.
Since comparison is at the heart of ordering, ordering on XML values in an SQL order by clause is not possible – UNLESS you cast the XML values to a data type that SQL can reasonably compare and order. The following query converts the extracted @id attribute to INTEGER, which allows the sorting as you would expect:
SELECT XMLQUERY('$i/emp/name/text()' PASSING doc AS "i") FROM employee ORDER BY XMLCAST( XMLQUERY('$i/emp/@id' PASSING doc AS "i") as INTEGER);
Another option is to sort on a column produced by the XMLTABLE function:
SELECT empid, empname FROM employee, XMLTABLE('$i/emp' PASSING doc AS "i" COLUMNS empname VARCHAR(25) PATH 'name', empid INTEGER PATH '@id') ORDER BY empid ;