Name/Value Pairs – A pretty bad idea in XML as in Relational !

January 21, 2011

Today’s topic is “Name/Value Pairs” – a data modeling approach that is also known as Key-Value Pairs (KVP) or Entity-Attribute-Value model (EAV).

In the relational database world, the Name/Value Pair approach is sometimes used when the number of required columns in a table is very large, unknown, or changing over time. In that case it is difficult to define a table with a fixed set of columns. That’s when some people resort to Name/Value Pairs.

To give an example, consider the following record from a traditional database table with columns id, firstname, lastname, company, and dateOfBirth:

(1, John, Doe, IBM, 1978-11-05)

In a Name/Value Pair table, the same information would be stored across four different rows with columns id, name, and value:

(1, firstname, John)
(1, lastname, Doe)
(1, company, IBM)
(1, dateOfBirth, 1978-11-05)

The flexibility of this table is wonderful (at first!), because you can store records of pretty much any format in such a table, even if the set of required fields is not known in advance or growing over time. Instead of defining fixed column names, you simply stick the field names into the column name and pair them up with a value.

BUT….  it’s well understood that such Name/Value Pair tables have serious problems that cause a lof of headache over time. In particular:

  • It’s very hard to define constraints for Name/Value Pair tables, because the logical meaning of the column value changes from row to row. Data quality is hard to enforce.
  • Writing business queries against Name/Value Pair tables is complex and often requires many self-joins. Reporting queries with many predicates are particularly difficult and inefficient.
  • Column statistics that usually aid database optimizers in the selection of good access plans often do not work well for Name/Value Pairs.
  • Mixing values of different data types in the column value causes further complications. The column value needs to be of type VARCHAR which means that correct numeric or DATE comparisons are not straight forward, may require casting, and it’s difficult to define adequate indexes. Note that the numeric comparison value > 5000  is not the same as the string comparison value > ‘5000’. The latter would also be true if value contains the string ‘7’. A workaround is to define multiple value columns, one for each required data type, but this leads to sparsely populated columns with lots of NULL values. It’s just not pretty.

Others have blogged extensively about the problems with Name/Value Pairs in relational databases long before me, and here are some examples:

What I actually want to write about is the use of Name/Value Pairs in XML.

But.. wait a minute! Why would anybody use Name/Value Pairs in XML? XML is extensible and flexible by design, so there isn’t really a problem with being constrained by a fixed set of fields. New XML elements and attributes can be invented and added as needed, specifically to deal with variable and evolving data formats. Many applications use XML exactly for this reason.

However, I have seen various cases where people have decided to model their XML with Name/Value Pairs. As a simple example, the following XML document describes a car with Name/Value Pairs:

<object type="car">
    <field name="brand" value="Honda"/>
    <field name="price" value="5000"/>
    <field name="year" value="1996"/>
</object>

This XML document uses generic and non-descriptive element and attributes names, such as object, field, name, and value. This defeats the purpose of XML as a markup language where tag names should be used to describe the meaning of the different data items. This is similar to column names of a relational table, which should describe the data in the columns.

In this example, the values “brand”, “price”, and “year” are metadata and should really be tag names, because they describe the intended meaning of the values “Honda”, “5000”, and “1996”. Here is what this XML document should look like:

<car>
    <brand>Honda</brand>
    <price>5000</price>
    <year>1996</year>
</car>

The problems with using Name/Value pairs in XML are similar as in relational tables:

  • For the XML document with Name/Value Pairs in red above, it is impossible to define an XML Schema that enforces, for example, the following rule: If the attribute “name”  has the value ‘price’, then the attribute “value” in the same “field” element must be numeric and greater than 0. In contrast, this rule is easy to define in an XML Schema for explicit tag names.
  • Let’s say you often search by “year” and want an XML index to support such searches. If you use Name/Value Pairs, most databases will not allow you to define an index on only those “value” attributes where the “name” attribute in the same “field” element has the value ‘year’. As a result you will likely index more data than needed, which means your index is larger and more expensive to maintain than with explicit tag names. Insert, update, delete performance will suffer.
  • Writing XPath expressions against Name/Value Pairs is complex and less efficient.

Not yet convinced? Try to write an XPath that retrieves the year of all the Honda cars that have a price greater than 5000.  For Name/Value Pair documents the query would look like this (yikes!):

/object[@type="car" and
        field[@name = "brand" and @value = "Honda"] and
        field[@name = "price" and @value >= 5000]
       ]/field[@name="year"]/data(@value) 

For corresponding documents with explicit tag names, the same query is much simpler and will be faster:

/car[brand="Honda" and price >= 5000]/year

I rest my case.

Eduardo Ciliendo/Switzerland/IBM@IBMCH, Eileen Lin/Santa Teresa/IBM@IBMUS, Mauro Tibolla/Switzerland/IBM@IBMCH
About these ads

15 Responses to “Name/Value Pairs – A pretty bad idea in XML as in Relational !”

  1. Adi Says:

    Agreed, using Name/Value pairs defeats the whole point of using XML!

    Unrelated, I just started the architecture phase of a project in which I plan to leverage DB2 pureXML…and I discovered your book a few days ago…what a pleasure to read..I will be reading that for the next few days before I move forward..excellent work.

  2. Dan Smith Says:

    I agree. However what you need to appreciate though when ‘tutting’ at the perceived ignorance of those who are producing these unclear schemas is that the Name/Value pair format is what is returned when you query
    NAME, VALUE table data using the : –
    FOR XML clause

    When FOR XML SqlClient.SqlCommands are used in .Net and combined with ExecuteXmlReader and XmlDataDocument
    it quickly and (otherwise) extremely elegantly returns XML data from a SQL Server database into an XML document – it makes writing web services a cinch.

    The only acceptable alternative I can find is to produce the XML in the CLR environment which will be more computationally expensive and also take me a lot longer. I’ve spent 4 hours trying to find an elegant way to achieve it in T-SQL (which shows just how much I do agree with your statement).

    I cite http://stackoverflow.com/questions/4234704/sql-table-key-value-pairs-to-a-xml

    I rejected manual concatenation using a TSQL function which was suggested there as it gets entitized when added to my parent FOR XML statement.

    I’m now going to spend another 2 hours writing my own XML output mechanism in .Net – purely so I can name the tags correctly. Most .Net / SQL Server developers aren’t going to be so fanatical about the output when so much time investment is needed.

    So, as a DB2 ‘bod’ I expect you’ll agree with me that Microsoft are responsible for this evil plague of poor XML schemas – for not allowing us to use columnar data as XML names.

    (Even PIVOT on SQL Server won’t let you use a sub-query to get your column names but instead insists they are fixed).

    Keep up the good work.
    Dan


    • Thanks for your comment, Dan! I agree with you. In particular, yes, if a relational table is already designed to hold name/value pairs, then querying that table with the FOR XML clause produces XML that’s also in the name/value format.
      As for converting relational data to XML, I just posted a blog entry on the functions XMLROW and XMLGROUP….

      Matthias

  3. Abhinav Says:

    I completely agree with you. Though we have two use-cases where we are using it.

    First, We have a generic cache solution to cache web-services where we need a generic way to store keys to search later based on these keys. The keys vary for different operations. Here we are using 5 name-value pairs but not as attributes. What we have is like CustNo1234String. This works well for us.

    Second, We have WSDLs which has fixed contracts. Here, we have a requirement to keep adding new fields with every release and these releases are quite fast (twice a week). Changing the web-services that is consumed by multiple consumers is pretty difficult. Here we have the Operation request and response divided into 2 structures. and . Base changes once a year with major release and we rationalize the group and include them in at that time. All the new fields we add in the group which is name value pair like DOB12012001Date. is a repeating structure. doesn’t helped us much since there are restrictions to have mandatory elements preceding element types. We are using XSD 1.0 and our tool doesn’t support XSD 1.1.

    Are you referring to using name/value as attributes is a bad idea or using it in any way a bad idea? Do you really think above use-cases are bad designs and we can improve it without using it?

    Regards,
    Abhinav.

  4. Alpheus Says:

    It should be kept in mind that there’s a certain “sweet spot” between “value-key pairs” and going straight to tags. When I worked on XML output of a natural language processor, and I saw a lot of code like “”; when I first started working with the data, I thought, “this is stupid!”, and proceeded to convert the tags to tags.

    It turned out to be a big mistake: the section names weren’t set in stone; many times, I would have to filter out invalid XML characters–and I could never seem to get all of them–and it would have been *easier* to index than to index the XML tags. And for everything else–I appreciated the flexibility, and the consistency, from their value-pair decisions.

    Of course, this is a special case–natural language processing is a complex endeavor; the flexibility was needed; and since the processing was of transcribed medical records, we didn’t have the “field” nonsense described in your post–xml tags were typically things like “section”, “medicine”, “procedure”, and so forth–and the XML was almost literally a parse tree of natural language (so it was easy, even natural, to traverse tags like “location” and “problem” to get “left elbow injury”). But having the trio of “type”, “name”, and sometimes “value”, made more sense than I originally recognised!

    (And the same thing can be said of my attempts to parse the original text into something that the UI could use. I started out specific, only to realize that I ought to have been more general.)

  5. ashish Says:

    how to unmarshall such a type of xml in java

  6. Ashish Says:

    Hi Matthias,

    XML format which you described

    Say if we are not having the specified XSD and we are supposed to develop all Bean classes ourselves, Then what should be the appropriate approach to unmarshall such a format of XML.

  7. Ashish Says:

    I am unable to paste xml directly so i removed the tag

    object type=”car”
    field name=”brand” value=”Honda”
    field name=”price” value=”5000″
    field name=”year” value=”1996″
    object


    • In my opinion, unmarshalling this type of XML format (i.e. the name-value pair XML format) to a Java object is often more difficult than using an XML format with explicit tags names.

      If your XML has explicit element and attribute names, then the unmarshalling can simply refer to these tag names to map the XML to a Java object. But, if you use the name-value pair format with generic tag names, it becomes harder to define this mapping. This difference is -conceptually- similar to the difference between the two XPath queries in this blog post.

      However, this also depend very much on what the actual Java object looks like that you want to populate.

      The best way to get a more precise answer is probably to prepare a concrete example of XML and Java object and post it in a Java/XML forum, such as http://www.coderanch.com/forums/f-31/XML
      or http://www.java-forums.org/xml/.

  8. RonK Says:

    The (unstated) downside of this approach is that the parameter names are now part of the XML Schema. So every time someone thinks up or needs to send a new parameter, the schema has to be revised and reissued.

    Depending on the version scheme, this may involve new namespaces, and incompatibilities between older and newer applications, even in messages where the new parameter is not being passed!

    A good rule of thumb … if the number of possible parameter types is not absolutely fixed for all time, be very leery about any design pattern that hard-wires parameter names into your schema.

    Presenting the parameter name as a value inside a element nicely sidesteps this problem.


    • Hi Ron,

      thanks for your comments.

      You are saying that a drawback of using explicit tag names is that every time there is a new element (“parameter”), someone needs to revise and reissue the XML Schema.

      I would argue that this concern applies to both explicit tags and generic tags (name/value pairs) in the same way.

      The key question is, do I want my XML Schema to contain and check all possible parameter names?

      If not, then the XML Schema for the XML format with explicit tag names can be designed such that ANY element name can appear in the relevant locations in the document. Then the schema does not need to be updated every time a new element is added, just like in the case with generic tag names.

      If yes, then the XML Schema for the XML format with the generic tags would have to specify the list of allowed values (allowed parameter names) for the generic element or attribute “name”. When a new paramater is added, the schema needs to be updated, just like in the case with explicit tag names.

      So, if the XML Schemas for the explicit tag format and the generic tag format are expected to have the same degree of detail, then the concern that you raise (which is conceptually valid) applies to both XML formats.

      Does that make sense?

      Thanks,

      Matthias


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 51 other followers

%d bloggers like this: