Page 291 -

P. 291

HAN
2011/6/1
13-ch06-243-278-9780123814791
254 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 254 #12

6.2.2 Generating Association Rules from Frequent Itemsets
Once the frequent itemsets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them (where strong associa-
tion rules satisfy both minimum support and minimum conﬁdence). This can be done
using Eq. (6.4) for conﬁdence, which we show again here for completeness:
support count(A ∪ B)
conﬁdence(A ⇒ B) = P(B|A) = .
support count(A)

The conditional probability is expressed in terms of itemset support count, where
support count(A ∪ B) is the number of transactions containing the itemsets A ∪ B, and
support count(A) is the number of transactions containing the itemset A. Based on this
equation, association rules can be generated as follows:

For each frequent itemset l, generate all nonempty subsets of l.
support count(l)
For every nonempty subset s of l, output the rule “s ⇒ (l − s)” if ≥
support count(s)
min conf, where min conf is the minimum conﬁdence threshold.
Because the rules are generated from frequent itemsets, each one automatically satis-
ﬁes the minimum support. Frequent itemsets can be stored ahead of time in hash tables
along with their counts so that they can be accessed quickly.

Example 6.4 Generating association rules. Let’s try an example based on the transactional data for
AllElectronics shown before in Table 6.1. The data contain frequent itemset X = {I1, I2,
I5}. What are the association rules that can be generated from X? The nonempty subsets
of X are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, and {I5}. The resulting association rules are
as shown below, each listed with its conﬁdence:
{I1,I2} ⇒ I5, conﬁdence = 2/4 = 50%
{I1,I5} ⇒ I2, conﬁdence = 2/2 = 100%
{I2,I5} ⇒ I1, conﬁdence = 2/2 = 100%
I1 ⇒ {I2,I5}, conﬁdence = 2/6 = 33%
I2 ⇒ {I1,I5}, conﬁdence = 2/7 = 29%
I5 ⇒ {I1,I2}, conﬁdence = 2/2 = 100%
If the minimum conﬁdence threshold is, say, 70%, then only the second, third, and
last rules are output, because these are the only ones generated that are strong. Note
that, unlike conventional classiﬁcation rules, association rules can contain more than
one conjunct in the right side of the rule.

6.2.3 Improving the Efﬁciency of Apriori

“How can we further improve the efﬁciency of Apriori-based mining?” Many variations of
the Apriori algorithm have been proposed that focus on improving the efﬁciency of the
original algorithm. Several of these variations are summarized as follows:

286 287 288 289 290 291 292 293 294 295 296