Page 217 -
P. 217

11-ch04-125-186-9780123814791
                         HAN
                                                            2011/6/1
          180   Chapter 4 Data Warehousing and Online Analytical Processing  3:17 Page 180  #56



                           attribute-oriented induction. Concept description is the most basic form of descrip-
                           tive data mining. It describes a given set of task-relevant data in a concise and
                           summarative manner, presenting interesting general properties of the data. Concept
                           (or class) description consists of characterization and comparison (or discrimi-
                           nation). The former summarizes and describes a data collection, called the target
                           class, whereas the latter summarizes and distinguishes one data collection, called
                           the target class, from other data collection(s), collectively called the contrasting
                           class(es).

                           Concept characterization can be implemented using data cube (OLAP-based)
                           approaches and the attribute-oriented induction approach. These are attribute-
                           or dimension-based generalization approaches. The attribute-oriented induction
                           approach consists of the following techniques: data focusing, data generalization by
                           attribute removal or attribute generalization, count and aggregate value accumulation,
                           attribute generalization control, and generalization data visualization.
                           Concept comparison can be performed using the attribute-oriented induction or
                           data cube approaches in a manner similar to concept characterization. Generalized
                           tuples from the target and contrasting classes can be quantitatively compared and
                           contrasted.




                 4.7     Exercises


                     4.1 State why, for the integration of multiple heterogeneous information sources, many
                         companies in industry prefer the update-driven approach (which constructs and uses
                         data warehouses), rather than the query-driven approach (which applies wrappers and
                         integrators). Describe situations where the query-driven approach is preferable to the
                         update-driven approach.
                     4.2 Briefly compare the following concepts. You may use an example to explain your
                         point(s).

                         (a) Snowflake schema, fact constellation, starnet query model
                         (b) Data cleaning, data transformation, refresh
                         (c) Discovery-driven cube, multifeature cube, virtual warehouse
                     4.3 Suppose that a data warehouse consists of the three dimensions time, doctor, and patient,
                         and the two measures count and charge, where charge is the fee that a doctor charges a
                         patient for a visit.
                         (a) Enumerate three classes of schemas that are popularly used for modeling data
                            warehouses.
                         (b) Draw a schema diagram for the above data warehouse using one of the schema
                            classes listed in (a).
   212   213   214   215   216   217   218   219   220   221   222