Page 181 -
P. 181
11-ch04-125-186-9780123814791
HAN
2011/6/1
144 Chapter 4 Data Warehousing and Online Analytical Processing 3:17 Page 144 #20
($0 $1000]
($0 $200] ($200 $400] ($400 $600] ($600 $800] ($800 $1000]
($0 … ($100… ($200… ($300… ($400… ($500… ($600… ($700… ($800… ($900…
$100] $200] $300] $400] $500] $600] $700] $800] $900] $1000]
Figure 4.11 A concept hierarchy for price.
be organized in a partial order, forming a lattice. An example of a partial order for the
time dimension based on the attributes day, week, month, quarter, and year is “day <
1
{month < quarter; week} < year.” This lattice structure is shown in Figure 4.10(b).
A concept hierarchy that is a total or partial order among attributes in a database schema
is called a schema hierarchy. Concept hierarchies that are common to many applica-
tions (e.g., for time) may be predefined in the data mining system. Data mining systems
should provide users with the flexibility to tailor predefined hierarchies according to
their particular needs. For example, users may want to define a fiscal year starting on
April 1 or an academic year starting on September 1.
Concept hierarchies may also be defined by discretizing or grouping values for a
given dimension or attribute, resulting in a set-grouping hierarchy. A total or partial
order can be defined among groups of values. An example of a set-grouping hierarchy is
shown in Figure 4.11 for the dimension price, where an interval ($X ...$Y] denotes the
range from $X (exclusive) to $Y (inclusive).
There may be more than one concept hierarchy for a given attribute or dimension,
based on different user viewpoints. For instance, a user may prefer to organize price by
defining ranges for inexpensive, moderately priced, and expensive.
Concept hierarchies may be provided manually by system users, domain experts, or
knowledge engineers, or may be automatically generated based on statistical analysis of
the data distribution. The automatic generation of concept hierarchies is discussed in
Chapter 3 as a preprocessing step in preparation for data mining.
Concept hierarchies allow data to be handled at varying levels of abstraction, as we
will see in Section 4.2.4.
4.2.4 Measures: Their Categorization and Computation
“How are measures computed?” To answer this question, we first study how measures can
be categorized. Note that a multidimensional point in the data cube space can be defined
1 Since a week often crosses the boundary of two consecutive months, it is usually not treated as a lower
abstraction of month. Instead, it is often treated as a lower abstraction of year, since a year contains
approximately 52 weeks.