Name/Value Pairs – A pretty bad idea in XML as in Relational !
January 21, 2011
Today’s topic is “Name/Value Pairs” – a data modeling approach that is also known as Key-Value Pairs (KVP) or Entity-Attribute-Value model (EAV).
In the relational database world, the Name/Value Pair approach is sometimes used when the number of required columns in a table is very large, unknown, or changing over time. In that case it is difficult to define a table with a fixed set of columns. That’s when some people resort to Name/Value Pairs.
To give an example, consider the following record from a traditional database table with columns id, firstname, lastname, company, and dateOfBirth:
(1, John, Doe, IBM, 1978-11-05)
In a Name/Value Pair table, the same information would be stored across four different rows with columns id, name, and value:
(1, firstname, John)
(1, lastname, Doe)
(1, company, IBM)
(1, dateOfBirth, 1978-11-05)
The flexibility of this table is wonderful (at first!), because you can store records of pretty much any format in such a table, even if the set of required fields is not known in advance or growing over time. Instead of defining fixed column names, you simply stick the field names into the column name and pair them up with a value.
BUT…. it’s well understood that such Name/Value Pair tables have serious problems that cause a lof of headache over time. In particular:
- It’s very hard to define constraints for Name/Value Pair tables, because the logical meaning of the column value changes from row to row. Data quality is hard to enforce.
- Writing business queries against Name/Value Pair tables is complex and often requires many self-joins. Reporting queries with many predicates are particularly difficult and inefficient.
- Column statistics that usually aid database optimizers in the selection of good access plans often do not work well for Name/Value Pairs.
- Mixing values of different data types in the column value causes further complications. The column value needs to be of type VARCHAR which means that correct numeric or DATE comparisons are not straight forward, may require casting, and it’s difficult to define adequate indexes. Note that the numeric comparison value > 5000 is not the same as the string comparison value > ’5000′. The latter would also be true if value contains the string ’7′. A workaround is to define multiple value columns, one for each required data type, but this leads to sparsely populated columns with lots of NULL values. It’s just not pretty.
Others have blogged extensively about the problems with Name/Value Pairs in relational databases long before me, and here are some examples:
What I actually want to write about is the use of Name/Value Pairs in XML.
But.. wait a minute! Why would anybody use Name/Value Pairs in XML? XML is extensible and flexible by design, so there isn’t really a problem with being constrained by a fixed set of fields. New XML elements and attributes can be invented and added as needed, specifically to deal with variable and evolving data formats. Many applications use XML exactly for this reason.
However, I have seen various cases where people have decided to model their XML with Name/Value Pairs. As a simple example, the following XML document describes a car with Name/Value Pairs:
<object type="car"> <field name="brand" value="Honda"/> <field name="price" value="5000"/> <field name="year" value="1996"/> </object>
This XML document uses generic and non-descriptive element and attributes names, such as object, field, name, and value. This defeats the purpose of XML as a markup language where tag names should be used to describe the meaning of the different data items. This is similar to column names of a relational table, which should describe the data in the columns.
In this example, the values “brand”, “price”, and “year” are metadata and should really be tag names, because they describe the intended meaning of the values “Honda”, “5000″, and “1996″. Here is what this XML document should look like:
<car> <brand>Honda</brand> <price>5000</price> <year>1996</year> </car>
The problems with using Name/Value pairs in XML are similar as in relational tables:
- For the XML document with Name/Value Pairs in red above, it is impossible to define an XML Schema that enforces, for example, the following rule: If the attribute “name” has the value ‘price’, then the attribute “value” in the same “field” element must be numeric and greater than 0. In contrast, this rule is easy to define in an XML Schema for explicit tag names.
- Let’s say you often search by “year” and want an XML index to support such searches. If you use Name/Value Pairs, most databases will not allow you to define an index on only those “value” attributes where the “name” attribute in the same “field” element has the value ‘year’. As a result you will likely index more data than needed, which means your index is larger and more expensive to maintain than with explicit tag names. Insert, update, delete performance will suffer.
- Writing XPath expressions against Name/Value Pairs is complex and less efficient.
Not yet convinced? Try to write an XPath that retrieves the year of all the Honda cars that have a price greater than 5000. For Name/Value Pair documents the query would look like this (yikes!):
/object[@type="car" and field[@name = "brand" and @value = "Honda"] and field[@name = "price" and @value >= 5000] ]/field[@name="year"]/data(@value)
For corresponding documents with explicit tag names, the same query is much simpler and will be faster:
/car[brand="Honda" and price >= 5000]/year
I rest my case.