Page 287 -
P. 287
HAN 13-ch06-243-278-9780123814791
2011/6/1
250 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 250 #8
is used as follows. Any (k − 1)-itemset that is not frequent cannot be a subset of a
frequent k-itemset. Hence, if any (k − 1)-subset of a candidate k-itemset is not in
L k−1 , then the candidate cannot be frequent either and so can be removed from C k .
This subset testing can be done quickly by maintaining a hash tree of all frequent
itemsets.
Example 6.3 Apriori. Let’s look at a concrete example, based on the AllElectronics transaction
database, D, of Table 6.1. There are nine transactions in this database, that is, |D| = 9.
We use Figure 6.2 to illustrate the Apriori algorithm for finding frequent itemsets in D.
1. In the first iteration of the algorithm, each item is a member of the set of candidate
1-itemsets, C 1 . The algorithm simply scans all of the transactions to count the
number of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup = 2. (Here,
we are referring to absolute support because we are using a support count. The corre-
sponding relative support is 2/9 = 22%.) The set of frequent 1-itemsets, L 1 , can then
be determined. It consists of the candidate 1-itemsets satisfying minimum support.
In our example, all of the candidates in C 1 satisfy minimum support.
3. To discover the set of frequent 2-itemsets, L 2 , the algorithm uses the join L 1 1 L 1 to
7
generate a candidate set of 2-itemsets, C 2 . C 2 consists of |L 1 | 2-itemsets. Note that
2
no candidates are removed from C 2 during the prune step because each subset of the
candidates is also frequent.
Table 6.1 Transactional Data for an AllElectronics
Branch
TID List of item IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
7 L 1 1 L 1 is equivalent to L 1 × L 1 , since the definition of L k 1 L k requires the two joining itemsets to
share k − 1 = 0 items.