What is an XML Schema and why should I care?

February 29, 2012

What is an XML Schema? Some of you may already know this, others don’t. So before I’m going to share some more technical information about XML Schemas in subsequent blog posts, I better get some of the basics out of the way first.

When you process and manage information in XML format, you can choose to use an XML Schema with your XML documents. Roughly speaking, an XML Schema can be used to define what you want your XML documents to look like. For example, in an XML Schema you can define:

  • Which elements and attributes are allowed to occur in your XML documents
  • How the elements can be or must be nested, or the order in which the elements must appear
  • Which elements or attributes are mandatory vs. optional
  • The number of times a given element can be repeated within a document (e.g. to allow for multiple phone numbers per customer, multiple items per order, etc.)
  • The data types of the element and attribute values, such as xs:integer, xs:decimal, xs:string, etc.
  • The namespaces that the elements belong to
  • …and so on.

If you choose to create an XML Schema, it may define just some or all of the aspects listed above. The designer of the XML Schema can choose the degree to which the schema constraints the characteristics of the XML documents. For example, an XML Schema can be very loose and define only a few key features for your XML documents and allow for a lot of flexibility. Or it can be very strict to tightly control the XML data in every aspect. Or anything in between.

The use of an XML Schema is optional, i.e. an XML Schema is not required to store, index, query, or update XML documents. However, an XML Schema can be very useful to ensure that the XML documents that you receive or  produce are compliant with certain structural rules that allow applications to process the XML. In other words, XML Schemas help you to enforce data quality.

Validation

If an document complies with a given XML Schema, then the document is said to be valid for this schema. A document might be valid for one schema but invalid for another schema. The process of testing an XML document for compliance with an XML Schema is called validation.

When an XML document is parsed by an XML parser, validation can be enabled as an optional part of the parsing process. Full validation of an XML document always requires XML parsing. For many documents and schemas, validation typically incurs only a small delta cost (in terms of CPU usage) on top of the cost of XML parsing.

What does an an XML Schema look like?

An XML Schema itself is an XML document! But, a very special document that needs to comply with very specific rules that are defined by -you guessed it!- another XML Schema, i.e. the schema for schemas.

Large XML schemas can consist of multiple schema documents that reference each other through import and include relationships. This allows you to compose an XML Schema out of smaller building blocks in a modular fashion.

I don’t want to go into the syntax details of XML Schemas here, but there are some useful resources available:

When and why should I use an XML Schema?

Simply put, if you want to ensure data quality and detect XML documents that do not comply with an expected format, use an XML Schema and validate each document!

However, what if XML documents pass through multiple components of your IT infrastructure, such as a message queue, an application server, an enterprise service bus, and the database system? If these components do not modify the XML but merely read & route it, examine whether all of these components need to validate each document. For example, if the application server has already validated a document before insertion into a DB2 database, does the document need to be validated again in DB2? Maybe not, if you trust the application layer. Maybe yes, if you don’t.

An XML Schema is also often used as a “contract” between two or more parties that exchange XML documents. With this contract the parties agree on a specific format and structure of the XML messages that they send and receive, to ensure seamless operation.

Practically every vertical industry has defined XML Schemas to standardize XML message formats for the data processing in their industry. A good overview is given by the introduction of this article:

“Getting started with Industry Formats and Services with pureXML”: http://www.ibm.com/developerworks/data/library/techarticle/dm-0705malaika/

How can I validate XML documents in DB2?

Simple. First, you register one or multiple XML Schemas in the DB2 database. This can be done with CLP commands, stored procedures, or through API calls in the JDBC or .NET interface to DB2. After a schema is registered in DB2, you can use it to validate XML documents in DB2, typically when you insert, load, or update XML documents. You can enforce a single XML Schema for all XML documents in an XML column, or you can allow multiple XML Schemas per column. A database administrator can force automatic validation upon document insert, or allow applications to choose one of the previously registered schema for validation whenever a document inserted.

And… validation can also be done in SQL statements?

Yup. The SQL standard defines a function called XMLVALIDATE, which can be used for document validation in INSERT statement, UPDATE statements, triggers, stored procedures, and even in queries.

Here is a simple example of an INSERT statement that adds a row to a customer customer table, which consists of an integer ID column and an XML column called “doc”:

INSERT INTO customer(id, doc)
VALUES (?, XMLVALIDATE( ? ACCORDING TO XMLSCHEMA ID db2admin.custxsd) );

The id and the document are provided by parameter markers “?”, and the XMLVALIDATE function that is wrapped around the second parameter ensures validation against the XML Schema that has been regoistered under the identifier db2admin.custxsd.

If the inserted document is not compliant with the XML Schema, the INSERT statement fails with an appropriate error message. Similarly, the XMLVALIDATE function can also be used in the right-hand side of the SET clause of an UPDATE statement that modifies or replaces an XML document.

Ok, so much for now. In my next blog post we’ll go into more detail.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: