Previous Section Next Section

Document- Versus Data-Centric XML

Generally speaking, there are two broad application areas of XML technologies. The first relates to document-centric applications, and the second to data-centric applications. Because XML can be used in so many different ways, it is important to understand the difference between these two categories.

Document-Centric XML

Because of its SGML origins, in the early days of its existence XML gained rapid adoption within publishing systems as a mechanism for representing semi-structured documents such as technical manuals, legal documents, and product catalogs. The content in these documents is typically meant for human consumption, although it could be processed by any number of applications before it is presented to humans. The key element of these documents is semi-structured marked-up text.

The following markup is a perfect example of XML used in a document-centric manner. The content is directed towards human consumption—it's part of the FastGlide skateboard user guide. The content is semi-structured. The usage rules for tags such as <B>, <I> and <LINK> are very loosely defined; they could appear pretty much anywhere in the document:

<H1>Skateboard Usage Requirements</H1>
<P>In order to use the <B>FastGlide</B> skateboard you have to
<ITEM>A strong pair of legs.</ITEM>
<ITEM>A reasonably long stretch of smooth road surface.</ITEM>
<ITEM>The impulse to impress others.</ITEM>
<P>If you have all of the above, you can proceed to <LINK
HREF="Chapter2.xml">Getting on the Board</LINK>.</P>

Data-Centric XML

By contrast, data-centric XML is used to mark up highly structured information such as the textual representation of relational data from databases, financial transaction information, and programming language data structures. Data-centric XML is typically generated by machines and is meant for machine consumption. It is XML's natural ability to nest and repeat markup that makes it the perfect choice for representing these types of data.

Consider the purchase order example in Listing 2.1. It is a purchase order from the Skateboard Warehouse, retailer of skateboards to SkatesTown. The order is for 5 backpacks, 12 skateboards, and 1,000 SkatesTown promotional stickers (this is what the stock keeping unit [SKU] of 008-PR stands for).

Listing 2.1 Purchase Order in XML
<po id="43871" submitted="2001-10-05">
      <company>The Skateboard Warehouse</company>
      <street>One Warehouse Park</street>
      <street>Building 17</street>
      <company>The Skateboard Warehouse</company>
      <street>One Warehouse Park</street>
      <street>Building 17</street>
      <item sku="318-BP" quantity="5">
         <description>Skateboard backpack; five pockets</description>
      <item sku="947-TI" quantity="12">
         <description>Street-style titanium skateboard.</description>
      <item sku="008-PR" quantity="1000">

The use of XML is very different from the previous user guide example:

  • The ratio of markup to content is high. The XML includes many different types of tags. There is no long-running text.

  • The XML includes machine-generated information; for example, the submission date of the purchase order uses a date-time format of year-month-day. A human authoring an XML document is unlikely to enter a date-time value in this format.

  • The tags are organized in a highly structured manner. Order and positioning matter, relative to other tags. For example, <description> must be under <item>, which must be under <order>, which must be under <po>. The <order> tag can be used only once in the document.

  • Markup is used to describe what a piece of information means rather than how it should be presented to a human.

In short, if you can easily imagine the XML as a data structure in your favorite programming language, you are probably looking at a data-centric use of XML. An example Java class that could, with a bit more work, be used to represent the purchase order data is shown here:

class PO
    int id;
    Date submitted;
    Address billTo;
    Address shipTo;
    Item order[];

Document Lifetime

Document- and data-centric uses of XML can differ in one other very significant aspect—the lifetime of the XML document. Typically, XML documents for human consumption (such as technical manuals and research papers) live a long time because the information contained in them can be used for a long time. On the other hand, some data-centric XML could live for only a few milliseconds. Consider the example of a database that is returning the results of a query in XML format. The whole operation takes several milliseconds. After the query is used, the data is discarded. Further, no real XML document exists. The XML is just bits on a wire or bits in an application's data structure. Still, for convenience purposes, we will use the term XML document to refer to any particular whole piece of XML being used. As a general identification of parts of a whole XML document, this book uses the highly technical term chunk.

Web services are about data-centric uses of XML. Through the rest of this chapter and the rest of this book, we will purposefully ignore discussing document-centric XML.

    Previous Section Next Section