Page 287 -
P. 287

HAN 13-ch06-243-278-9780123814791
                                                             2011/6/1

          250   Chapter 6 Mining Frequent Patterns, Associations, and Correlations  3:20  Page 250  #8



                           is used as follows. Any (k − 1)-itemset that is not frequent cannot be a subset of a
                           frequent k-itemset. Hence, if any (k − 1)-subset of a candidate k-itemset is not in
                           L k−1 , then the candidate cannot be frequent either and so can be removed from C k .
                           This subset testing can be done quickly by maintaining a hash tree of all frequent
                           itemsets.


            Example 6.3 Apriori. Let’s look at a concrete example, based on the AllElectronics transaction
                         database, D, of Table 6.1. There are nine transactions in this database, that is, |D| = 9.
                         We use Figure 6.2 to illustrate the Apriori algorithm for finding frequent itemsets in D.

                         1. In the first iteration of the algorithm, each item is a member of the set of candidate
                           1-itemsets, C 1 . The algorithm simply scans all of the transactions to count the
                           number of occurrences of each item.
                         2. Suppose that the minimum support count required is 2, that is, min sup = 2. (Here,
                           we are referring to absolute support because we are using a support count. The corre-
                           sponding relative support is 2/9 = 22%.) The set of frequent 1-itemsets, L 1 , can then
                           be determined. It consists of the candidate 1-itemsets satisfying minimum support.
                           In our example, all of the candidates in C 1 satisfy minimum support.
                         3. To discover the set of frequent 2-itemsets, L 2 , the algorithm uses the join L 1 1 L 1 to
                                                            7
                           generate a candidate set of 2-itemsets, C 2 . C 2 consists of  |L 1 |    2-itemsets. Note that
                                                                           2
                           no candidates are removed from C 2 during the prune step because each subset of the
                           candidates is also frequent.
               Table 6.1 Transactional Data for an AllElectronics
                         Branch
                         TID                 List of item IDs
                         T100                I1, I2, I5
                         T200                I2, I4
                         T300                I2, I3
                         T400                I1, I2, I4
                         T500                I1, I3
                         T600                I2, I3
                         T700                I1, I3
                         T800                I1, I2, I3, I5
                         T900                I1, I2, I3



                         7 L 1 1 L 1 is equivalent to L 1 × L 1 , since the definition of L k 1 L k requires the two joining itemsets to
                         share k − 1 = 0 items.
   282   283   284   285   286   287   288   289   290   291   292