Page 302 -

P. 302

13-ch06-243-278-9780123814791
3:20 Page 265
HAN
2011/6/1
#23
6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods 265

subjective, may differ from one user to another. However, objective interestingness mea-
sures, based on the statistics “behind” the data, can be used as one step toward the goal
of weeding out uninteresting rules that would otherwise be presented to the user.
“How can we tell which strong association rules are really interesting?” Let’s examine
the following example.

Example 6.7 A misleading “strong” association rule. Suppose we are interested in analyzing trans-
actions at AllElectronics with respect to the purchase of computer games and videos.
Let game refer to the transactions containing computer games, and video refer to those
containing videos. Of the 10,000 transactions analyzed, the data show that 6000 of the
customer transactions included computer games, while 7500 included videos, and 4000
included both computer games and videos. Suppose that a data mining program for
discovering association rules is run on the data, using a minimum support of, say, 30%
and a minimum conﬁdence of 60%. The following association rule is discovered:
buys(X, “computer games”) ⇒ buys(X, “videos”)
[support = 40%, conﬁdence = 66%]. (6.6)
Rule (6.6) is a strong association rule and would therefore be reported, since its support
value of 4000 = 40% and conﬁdence value of 4000 = 66% satisfy the minimum support
10,000 6000
and minimum conﬁdence thresholds, respectively. However, Rule (6.6) is misleading
because the probability of purchasing videos is 75%, which is even larger than 66%. In
fact, computer games and videos are negatively associated because the purchase of one
of these items actually decreases the likelihood of purchasing the other. Without fully
understanding this phenomenon, we could easily make unwise business decisions based
on Rule (6.6).

Example 6.7 also illustrates that the conﬁdence of a rule A ⇒ B can be deceiving. It
does not measure the real strength (or lack of strength) of the correlation and implica-
tion between A and B. Hence, alternatives to the support–conﬁdence framework can be
useful in mining interesting data relationships.

6.3.2 From Association Analysis to Correlation Analysis
As we have seen so far, the support and conﬁdence measures are insufﬁcient at ﬁltering
out uninteresting association rules. To tackle this weakness, a correlation measure can
be used to augment the support–conﬁdence framework for association rules. This leads
to correlation rules of the form
A ⇒ B [support, conﬁdence, correlation]. (6.7)

That is, a correlation rule is measured not only by its support and conﬁdence but also
by the correlation between itemsets A and B. There are many different correlation mea-
sures from which to choose. In this subsection, we study several correlation measures to
determine which would be good for mining large data sets.

297 298 299 300 301 302 303 304 305 306 307