Page 53 -
P. 53

HAN 08-ch01-001-038-9780123814791


          16    Chapter 1 Introduction                       2011/6/1  3:12  Page 16  #16



                           There are several methods for effective data summarization and characterization.
                         Simple data summaries based on statistical measures and plots are described in
                         Chapter 2. The data cube-based OLAP roll-up operation (Section 1.3.2) can be used
                         to perform user-controlled data summarization along a specified dimension. This pro-
                         cess is further detailed in Chapters 4 and 5, which discuss data warehousing. An
                         attribute-oriented induction technique can be used to perform data generalization and
                         characterization without step-by-step user interaction. This technique is also described
                         in Chapter 4.
                           The output of data characterization can be presented in various forms. Examples
                         include pie charts, bar charts, curves, multidimensional data cubes, and multidimen-
                         sional tables, including crosstabs. The resulting descriptions can also be presented as
                         generalized relations or in rule form (called characteristic rules).


            Example 1.5 Data characterization. A customer relationship manager at AllElectronics may order the
                         following data mining task: Summarize the characteristics of customers who spend more
                         than $5000 a year at AllElectronics. The result is a general profile of these customers,
                         such as that they are 40 to 50 years old, employed, and have excellent credit ratings. The
                         data mining system should allow the customer relationship manager to drill down on
                         any dimension, such as on occupation to view these customers according to their type of
                         employment.


                           Data discrimination is a comparison of the general features of the target class data
                         objects against the general features of objects from one or multiple contrasting classes.
                         The target and contrasting classes can be specified by a user, and the corresponding
                         data objects can be retrieved through database queries. For example, a user may want to
                         compare the general features of software products with sales that increased by 10% last
                         year against those with sales that decreased by at least 30% during the same period. The
                         methods used for data discrimination are similar to those used for data characterization.
                           “How are discrimination descriptions output?” The forms of output presentation
                         are similar to those for characteristic descriptions, although discrimination descrip-
                         tions should include comparative measures that help to distinguish between the target
                         and contrasting classes. Discrimination descriptions expressed in the form of rules are
                         referred to as discriminant rules.

            Example 1.6 Data discrimination. A customer relationship manager at AllElectronics may want to
                         compare two groups of customers—those who shop for computer products regularly
                         (e.g., more than twice a month) and those who rarely shop for such products (e.g.,
                         less than three times a year). The resulting description provides a general comparative
                         profile of these customers, such as that 80% of the customers who frequently purchase
                         computer products are between 20 and 40 years old and have a university education,
                         whereas 60% of the customers who infrequently buy such products are either seniors or
                         youths, and have no university degree. Drilling down on a dimension like occupation,
                         or adding a new dimension like income level, may help to find even more discriminative
                         features between the two classes.
   48   49   50   51   52   53   54   55   56   57   58