Page 302 -
P. 302
13-ch06-243-278-9780123814791
3:20 Page 265
HAN
2011/6/1
#23
6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods 265
subjective, may differ from one user to another. However, objective interestingness mea-
sures, based on the statistics “behind” the data, can be used as one step toward the goal
of weeding out uninteresting rules that would otherwise be presented to the user.
“How can we tell which strong association rules are really interesting?” Let’s examine
the following example.
Example 6.7 A misleading “strong” association rule. Suppose we are interested in analyzing trans-
actions at AllElectronics with respect to the purchase of computer games and videos.
Let game refer to the transactions containing computer games, and video refer to those
containing videos. Of the 10,000 transactions analyzed, the data show that 6000 of the
customer transactions included computer games, while 7500 included videos, and 4000
included both computer games and videos. Suppose that a data mining program for
discovering association rules is run on the data, using a minimum support of, say, 30%
and a minimum confidence of 60%. The following association rule is discovered:
buys(X, “computer games”) ⇒ buys(X, “videos”)
[support = 40%, confidence = 66%]. (6.6)
Rule (6.6) is a strong association rule and would therefore be reported, since its support
value of 4000 = 40% and confidence value of 4000 = 66% satisfy the minimum support
10,000 6000
and minimum confidence thresholds, respectively. However, Rule (6.6) is misleading
because the probability of purchasing videos is 75%, which is even larger than 66%. In
fact, computer games and videos are negatively associated because the purchase of one
of these items actually decreases the likelihood of purchasing the other. Without fully
understanding this phenomenon, we could easily make unwise business decisions based
on Rule (6.6).
Example 6.7 also illustrates that the confidence of a rule A ⇒ B can be deceiving. It
does not measure the real strength (or lack of strength) of the correlation and implica-
tion between A and B. Hence, alternatives to the support–confidence framework can be
useful in mining interesting data relationships.
6.3.2 From Association Analysis to Correlation Analysis
As we have seen so far, the support and confidence measures are insufficient at filtering
out uninteresting association rules. To tackle this weakness, a correlation measure can
be used to augment the support–confidence framework for association rules. This leads
to correlation rules of the form
A ⇒ B [support, confidence, correlation]. (6.7)
That is, a correlation rule is measured not only by its support and confidence but also
by the correlation between itemsets A and B. There are many different correlation mea-
sures from which to choose. In this subsection, we study several correlation measures to
determine which would be good for mining large data sets.