Page 204 -

P. 204

2011/6/1
3:17 Page 167
HAN
#43
11-ch04-125-186-9780123814791
4.5 Data Generalization by Attribute-Oriented Induction 167

User control versus automation: Online analytical processing in data warehouses
is a user-controlled process. The selection of dimensions and the application of
OLAP operations (e.g., drill-down, roll-up, slicing, and dicing) are primarily directed
and controlled by users. Although the control in most OLAP systems is quite user-
friendly, users do require a good understanding of the role of each dimension.
Furthermore, in order to ﬁnd a satisfactory description of the data, users may need to
specify a long sequence of OLAP operations. It is often desirable to have a more auto-
mated process that helps users determine which dimensions (or attributes) should
be included in the analysis, and the degree to which the given data set should be
generalized in order to produce an interesting summarization of the data.
This section presents an alternative method for concept description, called attribute-
oriented induction, which works for complex data types and relies on a data-driven
generalization process.

4.5.1 Attribute-Oriented Induction for Data Characterization
The attribute-oriented induction (AOI) approach to concept description was ﬁrst pro-
posed in 1989, a few years before the introduction of the data cube approach. The data
cube approach is essentially based on materialized views of the data, which typically
have been precomputed in a data warehouse. In general, it performs ofﬂine aggre-
gation before an OLAP or data mining query is submitted for processing. On the
other hand, the attribute-oriented induction approach is basically a query-oriented,
generalization-based, online data analysis technique. Note that there is no inherent
barrier distinguishing the two approaches based on online aggregation versus ofﬂine
precomputation. Some aggregations in the data cube can be computed online, while
ofﬂine precomputation of multidimensional space can speed up attribute-oriented
induction as well.
The general idea of attribute-oriented induction is to ﬁrst collect the task-relevant
data using a database query and then perform generalization based on the examination
of the number of each attribute’s distinct values in the relevant data set. The generali-
zation is performed by either attribute removal or attribute generalization. Aggregation
is performed by merging identical generalized tuples and accumulating their respec-
tive counts. This reduces the size of the generalized data set. The resulting generalized
relation can be mapped into different forms (e.g., charts or rules) for presentation to
the user.
The following illustrates the process of attribute-oriented induction. We ﬁrst discuss
its use for characterization. The method is extended for the mining of class comparisons
in Section 4.5.3.
Example 4.11 A data mining query for characterization. Suppose that a user wants to describe
the general characteristics of graduate students in the Big University database, given
the attributes name, gender, major, birth place, birth date, residence, phone# (telephone

199 200 201 202 203 204 205 206 207 208 209