Page 260 -
P. 260
chaPter 8 • analyzing systems Using Data Dictionaries 227
The XML document tends to mirror the data dictionary structure. The first entry (other than
an XML line identifying the document) is <customers>, which defines the entire collection of
customer information. The less than (<) and greater than (>) symbols are used to identify tag
names (similar to HTML). The last line of the XML document is a closing tag, </customers>,
signifying the end of the customer information.
Customer is defined first and contains an attribute, the customer number. There is often a
discussion about whether data should be stored as an element or an attribute. In this case, they
are stored as an attribute.
The name tag, <name>, is defined next because it is the first entry in the data dictionary.
NAME is a structure consisting of LAST NAME, FIRST NAME, and an optional MIDDLE
INITIAL. In the XML document, this structure starts with <name> and is followed by
<lastname>, <firstname>, and <middle_initial>. Because spaces are not allowed in XML tag
names, an underscore is typically used to separate words. The closing </name> tag signifies the
end of the group of elements. Using a structure such as name saves time and coding if the trans-
formation displays the full name. Each of the child elements will be on one line separated by a
space. Name also contains an attribute, either I for individual or C for corporation.
Indentation is used to show which structures contain elements. Note that <address> is simi-
lar to <customer>, but when we get to <order_information> there is a big difference.
There are multiple entries for <order_information>, each containing an <order_number>,
<order_date>, <shipping_date>, and <total>. Because the payment is made either by check or
credit card, only one of these may be present. In our example, payment is by check. The dates
have an attribute called format that indicates whether the date appears as month, day, year; year,
month, day; or day, month, year. If a credit card is used to make a payment, a TYPE attribute con-
tains either an M, V, A, D, or O, indicating the type of credit card (MasterCard, Visa, and so on).
XML Document Type Definitions
Often the element structure of XML content is defined using a document type definition (DTD).
A DTD is used to determine whether the XML document content is valid—that is, whether it
conforms to the order and type of data that must be present in the document. The DTD is easy to
create and well supported by standard software. Once the DTD has been completed, it may be
used to validate the XML document using standard XML tools. The DTD is easier to create if a
data dictionary has been completed, since the analyst has worked with users and made decisions
on the structure of the data.
Figure 8.17 illustrates the document type definition for the Customer XML document.
Keywords, such as !DOCTYPE, indicating the start of the DTD, must be in capital letters.
!ELEMENT describes an element, and !ATTLIST describes an attribute, listing the element
name followed by the attribute name. An element that has the keyword #PCDATA, for parsed
character data, is a primitive element, not further defined. An element that has a series of other
elements within parentheses means that they are child elements and must be in the order listed.
The statement <!ELEMENT name (lastname, firstname, middle_initial?)> means that the name
must have the last name followed by the first name followed by the middle initial.
The question mark after “middle_initial” means that the element is optional and may be
left out of the document for a particular customer. A plus sign means that there are one or more
repeatable elements. Customers must contain at least one customer tag but could contain many
customer tags. An asterisk means that there is zero or more of the elements. Each customer may
have zero to many orders. A vertical bar separates two or more child elements that are mutually
exclusive. Payment contains either check or credit card as options.
The attribute list definition for a customer number contains a keyword ID (in uppercase let-
ters). This means that the attribute number must appear only once in the XML document as an
attribute for an element with an ID. That it is somewhat similar to a primary key. The difference
is that, if the document had several different elements, each with an ID attribute, the given ID
(C15008 in this example) could appear only once. An ID must start with a letter or an underscore
and cannot be solely a number. The reason behind putting the customer number as an ID is to
ensure that it is not repeated in a longer document. The keyword #REQUIRED means that the
attribute must be present. A keyword of #IMPLIED means that the attribute is optional. A docu-
ment may also have an IDREF attribute, which links one element with another that is an ID. The
ORDER tag has a customer_number attribute defined as an IDREF, and the value C15008 must