Page 217 -
P. 217
11-ch04-125-186-9780123814791
HAN
2011/6/1
180 Chapter 4 Data Warehousing and Online Analytical Processing 3:17 Page 180 #56
attribute-oriented induction. Concept description is the most basic form of descrip-
tive data mining. It describes a given set of task-relevant data in a concise and
summarative manner, presenting interesting general properties of the data. Concept
(or class) description consists of characterization and comparison (or discrimi-
nation). The former summarizes and describes a data collection, called the target
class, whereas the latter summarizes and distinguishes one data collection, called
the target class, from other data collection(s), collectively called the contrasting
class(es).
Concept characterization can be implemented using data cube (OLAP-based)
approaches and the attribute-oriented induction approach. These are attribute-
or dimension-based generalization approaches. The attribute-oriented induction
approach consists of the following techniques: data focusing, data generalization by
attribute removal or attribute generalization, count and aggregate value accumulation,
attribute generalization control, and generalization data visualization.
Concept comparison can be performed using the attribute-oriented induction or
data cube approaches in a manner similar to concept characterization. Generalized
tuples from the target and contrasting classes can be quantitatively compared and
contrasted.
4.7 Exercises
4.1 State why, for the integration of multiple heterogeneous information sources, many
companies in industry prefer the update-driven approach (which constructs and uses
data warehouses), rather than the query-driven approach (which applies wrappers and
integrators). Describe situations where the query-driven approach is preferable to the
update-driven approach.
4.2 Briefly compare the following concepts. You may use an example to explain your
point(s).
(a) Snowflake schema, fact constellation, starnet query model
(b) Data cleaning, data transformation, refresh
(c) Discovery-driven cube, multifeature cube, virtual warehouse
4.3 Suppose that a data warehouse consists of the three dimensions time, doctor, and patient,
and the two measures count and charge, where charge is the fee that a doctor charges a
patient for a visit.
(a) Enumerate three classes of schemas that are popularly used for modeling data
warehouses.
(b) Draw a schema diagram for the above data warehouse using one of the schema
classes listed in (a).