Page 302 -
P. 302

13-ch06-243-278-9780123814791
                                                                     3:20 Page 265
                         HAN
                                                            2011/6/1
                                                                                    #23
                                         6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods  265


                               subjective, may differ from one user to another. However, objective interestingness mea-
                               sures, based on the statistics “behind” the data, can be used as one step toward the goal
                               of weeding out uninteresting rules that would otherwise be presented to the user.
                                 “How can we tell which strong association rules are really interesting?” Let’s examine
                               the following example.

                  Example 6.7 A misleading “strong” association rule. Suppose we are interested in analyzing trans-
                               actions at AllElectronics with respect to the purchase of computer games and videos.
                               Let game refer to the transactions containing computer games, and video refer to those
                               containing videos. Of the 10,000 transactions analyzed, the data show that 6000 of the
                               customer transactions included computer games, while 7500 included videos, and 4000
                               included both computer games and videos. Suppose that a data mining program for
                               discovering association rules is run on the data, using a minimum support of, say, 30%
                               and a minimum confidence of 60%. The following association rule is discovered:
                                              buys(X, “computer games”) ⇒ buys(X, “videos”)
                                                  [support = 40%, confidence = 66%].             (6.6)
                               Rule (6.6) is a strong association rule and would therefore be reported, since its support
                               value of  4000  = 40% and confidence value of  4000  = 66% satisfy the minimum support
                                      10,000                       6000
                               and minimum confidence thresholds, respectively. However, Rule (6.6) is misleading
                               because the probability of purchasing videos is 75%, which is even larger than 66%. In
                               fact, computer games and videos are negatively associated because the purchase of one
                               of these items actually decreases the likelihood of purchasing the other. Without fully
                               understanding this phenomenon, we could easily make unwise business decisions based
                               on Rule (6.6).

                                 Example 6.7 also illustrates that the confidence of a rule A ⇒ B can be deceiving. It
                               does not measure the real strength (or lack of strength) of the correlation and implica-
                               tion between A and B. Hence, alternatives to the support–confidence framework can be
                               useful in mining interesting data relationships.


                         6.3.2 From Association Analysis to Correlation Analysis
                               As we have seen so far, the support and confidence measures are insufficient at filtering
                               out uninteresting association rules. To tackle this weakness, a correlation measure can
                               be used to augment the support–confidence framework for association rules. This leads
                               to correlation rules of the form
                                                 A ⇒ B [support, confidence, correlation].       (6.7)

                               That is, a correlation rule is measured not only by its support and confidence but also
                               by the correlation between itemsets A and B. There are many different correlation mea-
                               sures from which to choose. In this subsection, we study several correlation measures to
                               determine which would be good for mining large data sets.
   297   298   299   300   301   302   303   304   305   306   307