Page 289 -
P. 289
2011/6/1
13-ch06-243-278-9780123814791
HAN
252 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 252 #10
(a) Join: C 3 = L 2 1 L 2 = {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}}
1{{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}}
= {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
(b) Prune using the Apriori property: All nonempty subsets of a frequent itemset must also be
frequent. Do any of the candidates have a subset that is not frequent?
The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}. All 2-item subsets
of {I1, I2, I3} are members of L 2 . Therefore, keep {I1, I2, I3} in C 3 .
The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}. All 2-item subsets of
{I1, I2, I5} are members of L 2 . Therefore, keep {I1, I2, I5} in C 3 .
The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}. {I3, I5} is not
a member of L 2 , and so it is not frequent. Therefore, remove {I1, I3, I5} from C 3 .
The 2-item subsets of {I2, I3, I4} are {I2, I3}, {I2, I4}, and {I3, I4}. {I3, I4} is not a
member of L 2 , and so it is not frequent. Therefore, remove {I2, I3, I4} from C 3 .
The 2-item subsets of {I2, I3, I5} are {I2, I3}, {I2, I5}, and {I3, I5}. {I3, I5} is not
a member of L 2 , and so it is not frequent. Therefore, remove {I2, I3, I5} from C 3 .
The 2-item subsets of {I2, I4, I5} are {I2, I4}, {I2, I5}, and {I4, I5}. {I4, I5} is not a
member of L 2 , and so it is not frequent. Therefore, remove {I2, I4, I5} from C 3 .
(c) Therefore, C 3 = {{I1, I2, I3}, {I1, I2, I5}} after pruning.
Figure 6.3 Generation and pruning of candidate 3-itemsets, C 3 , from L 2 using the Apriori property.
search strategy. The resulting pruned version of C 3 is shown in the first table of the
bottom row of Figure 6.2.
7. The transactions in D are scanned to determine L 3 , consisting of those candidate
3-itemsets in C 3 having minimum support (Figure 6.2).
8. The algorithm uses L 3 1 L 3 to generate a candidate set of 4-itemsets, C 4 . Although
the join results in {{I1, I2, I3, I5}}, itemset {I1, I2, I3, I5} is pruned because its subset
{I2, I3, I5} is not frequent. Thus, C 4 = φ, and the algorithm terminates, having found
all of the frequent itemsets.
Figure 6.4 shows pseudocode for the Apriori algorithm and its related procedures.
Step 1 of Apriori finds the frequent 1-itemsets, L 1 . In steps 2 through 10, L k−1 is used
to generate candidates C k to find L k for k ≥ 2. The apriori gen procedure generates the
candidates and then uses the Apriori property to eliminate those having a subset that is
not frequent (step 3). This procedure is described later. Once all of the candidates have
been generated, the database is scanned (step 4). For each transaction, a subset function
is used to find all subsets of the transaction that are candidates (step 5), and the count
for each of these candidates is accumulated (steps 6 and 7). Finally, all the candidates
satisfying the minimum support (step 9) form the set of frequent itemsets, L (step 11).