Page 304 -
P. 304

3:20 Page 267
                         HAN
                                                            2011/6/1
                                                                                    #25
                               13-ch06-243-278-9780123814791
                                         6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods  267


                     Table 6.6 2 × 2 Contingency Table Summarizing the
                               Transactions with Respect to Game and
                               Video Purchases
                                         game      game       6 row
                               video     4000      3500        7500
                               video     2000       500        2500
                               6 col     6000      4000       10,000


                     Table 6.7 Table 6.6 Contingency Table, Now with
                               the Expected Values
                                     game       game       6 row
                               video  4000 (4500)  3500 (3000)  7500
                               video  2000 (1500)  500 (1000)  2500
                               6 col  6000      4000       10,000

                                                      2
                                                                                     2
                  Example 6.9 Correlation analysis using χ . To compute the correlation using χ analysis for nom-
                               inal data, we need the observed value and expected value (displayed in parenthesis) for
                               each slot of the contingency table, as shown in Table 6.7. From the table, we can compute
                                   2
                               the χ value as follows:
                                             (observed − expected) 2  (4000 − 4500) 2  (3500 − 3000) 2
                                        2
                                      χ = 6                    =              +
                                                  expected           4500           3000
                                             (2000 − 1500) 2  (500 − 1000) 2
                                           +              +             = 555.6.
                                                 1500           1000
                                          2
                               Because the χ value is greater than 1, and the observed value of the slot (game, video) =
                               4000, which is less than the expected value of 4500, buying game and buying video are
                               negatively correlated. This is consistent with the conclusion derived from the analysis of
                               the lift measure in Example 6.8.



                         6.3.3 A Comparison of Pattern Evaluation Measures
                               The above discussion shows that instead of using the simple support–confidence frame-
                                                                                       2
                               work to evaluate frequent patterns, other measures, such as lift and χ , often disclose
                               more intrinsic pattern relationships. How effective are these measures? Should we also
                               consider other alternatives?
                                 Researchers have studied many pattern evaluation measures even before the start of
                               in-depth research on scalable methods for mining frequent patterns. Recently, several
                               other pattern evaluation measures have attracted interest. In this subsection, we present
   299   300   301   302   303   304   305   306   307   308   309