Page 260 -
P. 260

chaPter 8  •  analyzing systems Using Data Dictionaries     227

                     The XML document tends to mirror the data dictionary structure. The first entry (other than
                 an XML line identifying the document) is <customers>, which defines the entire collection of
                 customer information. The less than (<) and greater than (>) symbols are used to identify tag
                 names (similar to HTML). The last line of the XML document is a closing tag, </customers>,
                 signifying the end of the customer information.
                     Customer is defined first and contains an attribute, the customer number. There is often a
                 discussion about whether data should be stored as an element or an attribute. In this case, they
                 are stored as an attribute.
                     The name tag, <name>, is defined next because it is the first entry in the data dictionary.
                 NAME is a structure consisting of LAST NAME, FIRST NAME, and an optional MIDDLE
                 INITIAL. In the XML document, this structure starts with <name> and is followed by
                 <lastname>, <firstname>, and <middle_initial>. Because spaces are not allowed in XML tag
                 names, an underscore is typically used to separate words. The closing </name> tag signifies the
                 end of the group of elements. Using a structure such as name saves time and coding if the trans-
                 formation displays the full name. Each of the child elements will be on one line separated by a
                 space. Name also contains an attribute, either I for individual or C for corporation.
                     Indentation is used to show which structures contain elements. Note that <address> is simi-
                 lar to <customer>, but when we get to <order_information> there is a big difference.
                     There are multiple entries for <order_information>, each containing an <order_number>,
                 <order_date>, <shipping_date>, and <total>. Because the payment is made either by check or
                 credit card, only one of these may be present. In our example, payment is by check. The dates
                 have an attribute called format that indicates whether the date appears as month, day, year; year,
                 month, day; or day, month, year. If a credit card is used to make a payment, a TYPE attribute con-
                 tains either an M, V, A, D, or O, indicating the type of credit card (MasterCard, Visa, and so on).

                 XML Document Type Definitions
                 Often the element structure of XML content is defined using a document type definition (DTD).
                 A DTD is used to determine whether the XML document content is valid—that is, whether it
                 conforms to the order and type of data that must be present in the document. The DTD is easy to
                 create and well supported by standard software. Once the DTD has been completed, it may be
                 used to validate the XML document using standard XML tools. The DTD is easier to create if a
                 data dictionary has been completed, since the analyst has worked with users and made decisions
                 on the structure of the data.
                     Figure 8.17 illustrates the document type definition for the Customer XML document.
                 Keywords, such as !DOCTYPE, indicating the start of the DTD, must be in capital letters.
                 !ELEMENT describes an element, and !ATTLIST describes an attribute, listing the element
                 name followed by the attribute name. An element that has the keyword #PCDATA, for parsed
                 character data, is a primitive element, not further defined. An element that has a series of other
                 elements within parentheses means that they are child elements and must be in the order listed.
                 The statement <!ELEMENT name (lastname, firstname, middle_initial?)> means that the name
                 must have the last name followed by the first name followed by the middle initial.
                     The question mark after “middle_initial” means that the element is optional and may be
                 left out of the document for a particular customer. A plus sign means that there are one or more
                 repeatable elements. Customers must contain at least one customer tag but could contain many
                 customer tags. An asterisk means that there is zero or more of the elements. Each customer may
                 have zero to many orders. A vertical bar separates two or more child elements that are mutually
                 exclusive. Payment contains either check or credit card as options.
                     The attribute list definition for a customer number contains a keyword ID (in uppercase let-
                 ters). This means that the attribute number must appear only once in the XML document as an
                 attribute for an element with an ID. That it is somewhat similar to a primary key. The difference
                 is that, if the document had several different elements, each with an ID attribute, the given ID
                 (C15008 in this example) could appear only once. An ID must start with a letter or an underscore
                 and cannot be solely a number. The reason behind putting the customer number as an ID is to
                 ensure that it is not repeated in a longer document. The keyword #REQUIRED means that the
                 attribute must be present. A keyword of #IMPLIED means that the attribute is optional. A docu-
                 ment may also have an IDREF attribute, which links one element with another that is an ID. The
                 ORDER tag has a customer_number attribute defined as an IDREF, and the value C15008 must
   255   256   257   258   259   260   261   262   263   264   265