Page 181 -
P. 181

11-ch04-125-186-9780123814791
                         HAN
                                                            2011/6/1
          144   Chapter 4 Data Warehousing and Online Analytical Processing  3:17 Page 144  #20



                                                       ($0   $1000]




                             ($0   $200]  ($200   $400]  ($400   $600]  ($600   $800]  ($800   $1000]


                         ($0 …    ($100…  ($200…  ($300…  ($400… ($500…  ($600…  ($700…  ($800…  ($900…
                          $100]  $200]  $300]  $400]  $500]  $600]  $700]  $800]  $900]  $1000]


              Figure 4.11 A concept hierarchy for price.

                         be organized in a partial order, forming a lattice. An example of a partial order for the
                         time dimension based on the attributes day, week, month, quarter, and year is “day <
                                                    1
                         {month < quarter; week} < year.” This lattice structure is shown in Figure 4.10(b).
                         A concept hierarchy that is a total or partial order among attributes in a database schema
                         is called a schema hierarchy. Concept hierarchies that are common to many applica-
                         tions (e.g., for time) may be predefined in the data mining system. Data mining systems
                         should provide users with the flexibility to tailor predefined hierarchies according to
                         their particular needs. For example, users may want to define a fiscal year starting on
                         April 1 or an academic year starting on September 1.
                           Concept hierarchies may also be defined by discretizing or grouping values for a
                         given dimension or attribute, resulting in a set-grouping hierarchy. A total or partial
                         order can be defined among groups of values. An example of a set-grouping hierarchy is
                         shown in Figure 4.11 for the dimension price, where an interval ($X ...$Y] denotes the
                         range from $X (exclusive) to $Y (inclusive).
                           There may be more than one concept hierarchy for a given attribute or dimension,
                         based on different user viewpoints. For instance, a user may prefer to organize price by
                         defining ranges for inexpensive, moderately priced, and expensive.
                           Concept hierarchies may be provided manually by system users, domain experts, or
                         knowledge engineers, or may be automatically generated based on statistical analysis of
                         the data distribution. The automatic generation of concept hierarchies is discussed in
                         Chapter 3 as a preprocessing step in preparation for data mining.
                           Concept hierarchies allow data to be handled at varying levels of abstraction, as we
                         will see in Section 4.2.4.

                   4.2.4 Measures: Their Categorization and Computation
                         “How are measures computed?” To answer this question, we first study how measures can
                         be categorized. Note that a multidimensional point in the data cube space can be defined

                         1 Since a week often crosses the boundary of two consecutive months, it is usually not treated as a lower
                         abstraction of month. Instead, it is often treated as a lower abstraction of year, since a year contains
                         approximately 52 weeks.
   176   177   178   179   180   181   182   183   184   185   186