Page 213 -
P. 213

2011/6/1
                         HAN
                               11-ch04-125-186-9780123814791
          176   Chapter 4 Data Warehousing and Online Analytical Processing  3:17 Page 176  #52



                           comparison between the target and contrasting classes. The user can adjust the com-
                           parison description by applying drill-down, roll-up, and other OLAP operations to
                           the target and contrasting classes, as desired.

                           The preceding discussion outlines a general algorithm for mining comparisons
                         in databases. In comparison with characterization, the previous algorithm involves
                         synchronous generalization of the target class with the contrasting classes, so that classes
                         are simultaneously compared at the same abstraction levels.
                           Example 4.14 mines a class comparison describing the graduate and undergraduate
                         students at Big University.
           Example 4.14 Mining a class comparison. Suppose that you would like to compare the general pro-
                         perties of the graduate and undergraduate students at Big University, given the attributes
                         name, gender, major, birth place, birth date, residence, phone#, and gpa.
                           This data mining task can be expressed in DMQL as follows:

                             use Big University DB
                             mine comparison as “grad vs undergrad students”
                             in relevance to name, gender, major, birth place, birth date, residence,
                                 phone#, gpa
                             for “graduate students”
                             where status in “graduate”
                             versus “undergraduate students”
                             where status in “undergraduate”
                             analyze count%
                             from student
                         Let’s see how this typical example of a data mining query for mining comparison
                         descriptions can be processed.
                           First, the query is transformed into two relational queries that collect two sets of task-
                         relevant data: one for the initial target-class working relation and the other for the initial
                         contrasting-class working relation, as shown in Tables 4.8 and 4.9. This can also be viewed
                         as the construction of a data cube, where the status {graduate, undergraduate} serves as
                         one dimension, and the other attributes form the remaining dimensions.
                           Second, dimension relevance analysis can be performed, when necessary, on the two
                         classes of data. After this analysis, irrelevant or weakly relevant dimensions (e.g., name,
                         gender, birth place, residence, and phone#) are removed from the resulting classes. Only
                         the highly relevant attributes are included in the subsequent analysis.
                           Third, synchronous generalization is performed on the target class to the levels con-
                         trolled by user- or expert-specified dimension thresholds, forming the prime target class
                         relation. The contrasting class is generalized to the same levels as those in the prime
                         target class relation, forming the prime contrasting class(es) relation, as presented in
                         Tables 4.10 and 4.11. In comparison with undergraduate students, graduate students
                         tend to be older and have a higher GPA in general.
   208   209   210   211   212   213   214   215   216   217   218