Document Type Definitions
Document Type Definitions (DTDs) are an optional feature of XML documents. A document associated with a DTD has a set of rules regarding what elements and attributes can be part of the document and where can they appear. DTDs originate from SGML, although XML's DTDs are greatly simplified. The presence of DTDs in XML documents allows us to distinguish the concepts of well-formedness and validity
Well-Formedness and Validity
If a document subscribes to the rules of XML syntax (as described in the section "XML Instances") it is considered well-formed. Well-formedness implies that XML processing software can read the document without any basic errors associated with parsing such as invalid character data, mismatched start and end tags, multiple attributes with the same name, and so on. The XML Specification mandates that if any well-formedness constraint is not met, the XML parser must immediately generate a non-recoverable error. This rigid mandate makes it easy to separate the doings of the software focused on the logical structure of an XML document (what the markup means) from the mundane details of the physical structure of the document (the markup syntax).
However, well-formedness is not sufficient for most applications. Consider, for example, the SkatesTown order processing application. When an XML document is submitted to it, it cares not that it is well-formed XML but that it is indeed a purchase order in the specific XML format it requires. The notion of format applies to the set of rules describing SkatesTown's purchase orders: "The document must begin with a po element that has two attributes (id and submitted) which will be followed by a billTo element…" and so on. In other words, before a submitted document is processed, it must be identified as a valid purchase order.
This is how the notion of validity comes in. DTDs offer an automated, declarative mechanism for validating the contents of XML documents as they are parsed. Therefore, XML applications can limit the amount of validation they need to perform. If the SkatesTown purchase order processing application could not delegate validation to the XML processor, it would have had to express all validation rules directly in code. Code is procedural in nature and much harder to maintain than DTDs, which are declarative and have a reasonably readable syntax.
To handle validity checks, DTDs must enable the following:
Last but not least, there needs to be a mechanism to associate DTDs with XML documents.
DTDs are a mechanism to express the valid structure of a document. One way to visualize the structure of a document is as a tree of possible element and attribute combinations. For example, Figure 2.3 shows the document structure for purchase orders as expressed by a popular XML processing tool. The image uses some syntax from regular expressions to visualize the multiplicity of elements: question mark (?) stands for optional (zero or one), asterisk (*) stands for any (zero or more) , and plus (+) stands for at least some (one or more).
Every element in the document structure tree has an associated model group. Model groups identify the sequencing and multiplicity of element content. There are two types of sequences: sequence and choice. Sequence defines the exact order in which child elements must appear. In DTDs, the sequence operator in model groups is the comma (,). The model group (A, B, C) defines a content model where the first child element will be A, followed by B, followed by C. Choice defines the possible elements that can appear at any given position in the content model. The choice operator in model groups is the pipe character (|). The model group (A | B | C) defines a content model where there will be only one child element that can be A or B or C. Sequences and choices can be nested, as in ((A | (X, Y, Z)), B, (C | D)). This content model defines the following possible combinations of child elements:
The multiplicity of elements is defined using the same regular expression syntax used in document structure trees. The absence of a suffix stands for exactly one, question mark (?) stands for optional (zero or one), asterisk (*) stands for any (zero or more), and plus (+) stands for at least some (one or more). For example, the model group (A, B?, C*, D+) allows for the following combinations of child elements (… stands for "potentially many more of the same element"):
Are DTDs Enough?
Documents associated with DTDs are a huge step forward from basic XML markup. DTDs allow for validating document structure (element content, allowed attributes, and their value types), which significantly reduces the amount of custom validation code that needs to be written in XML applications. However, DTDs have some notable deficiencies:
For these reasons, this chapter will not discuss DTDs in any further detail. We won't even introduce the basic DTD syntax here because data-oriented XML applications have moved away from DTDs; these applications use another mechanism to validate XML documents and to enforce document structure and datatype rules. To address the problems inherent in DTDs, the XML community developed XML Schema, a much richer meta-language for XML documents expressed natively in XML.