Page 289 -
P. 289

2011/6/1
                               13-ch06-243-278-9780123814791
                         HAN
          252   Chapter 6 Mining Frequent Patterns, Associations, and Correlations  3:20 Page 252  #10



                         (a) Join: C 3 = L 2 1 L 2 = {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}}
                                           1{{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}}
                                          = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
                         (b) Prune using the Apriori property: All nonempty subsets of a frequent itemset must also be
                            frequent. Do any of the candidates have a subset that is not frequent?

                              The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}. All 2-item subsets
                              of {I1, I2, I3} are members of L 2 . Therefore, keep {I1, I2, I3} in C 3 .
                              The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}. All 2-item subsets of
                              {I1, I2, I5} are members of L 2 . Therefore, keep {I1, I2, I5} in C 3 .
                              The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}. {I3, I5} is not
                              a member of L 2 , and so it is not frequent. Therefore, remove {I1, I3, I5} from C 3 .
                              The 2-item subsets of {I2, I3, I4} are {I2, I3}, {I2, I4}, and {I3, I4}. {I3, I4} is not a
                              member of L 2 , and so it is not frequent. Therefore, remove {I2, I3, I4} from C 3 .
                              The 2-item subsets of {I2, I3, I5} are {I2, I3}, {I2, I5}, and {I3, I5}. {I3, I5} is not
                              a member of L 2 , and so it is not frequent. Therefore, remove {I2, I3, I5} from C 3 .
                              The 2-item subsets of {I2, I4, I5} are {I2, I4}, {I2, I5}, and {I4, I5}. {I4, I5} is not a
                              member of L 2 , and so it is not frequent. Therefore, remove {I2, I4, I5} from C 3 .
                         (c) Therefore, C 3 = {{I1, I2, I3}, {I1, I2, I5}} after pruning.


               Figure 6.3 Generation and pruning of candidate 3-itemsets, C 3 , from L 2 using the Apriori property.




                           search strategy. The resulting pruned version of C 3 is shown in the first table of the
                           bottom row of Figure 6.2.
                         7. The transactions in D are scanned to determine L 3 , consisting of those candidate
                           3-itemsets in C 3 having minimum support (Figure 6.2).
                         8. The algorithm uses L 3 1 L 3 to generate a candidate set of 4-itemsets, C 4 . Although
                           the join results in {{I1, I2, I3, I5}}, itemset {I1, I2, I3, I5} is pruned because its subset
                           {I2, I3, I5} is not frequent. Thus, C 4 = φ, and the algorithm terminates, having found
                           all of the frequent itemsets.

                           Figure 6.4 shows pseudocode for the Apriori algorithm and its related procedures.
                         Step 1 of Apriori finds the frequent 1-itemsets, L 1 . In steps 2 through 10, L k−1 is used
                         to generate candidates C k to find L k for k ≥ 2. The apriori gen procedure generates the
                         candidates and then uses the Apriori property to eliminate those having a subset that is
                         not frequent (step 3). This procedure is described later. Once all of the candidates have
                         been generated, the database is scanned (step 4). For each transaction, a subset function
                         is used to find all subsets of the transaction that are candidates (step 5), and the count
                         for each of these candidates is accumulated (steps 6 and 7). Finally, all the candidates
                         satisfying the minimum support (step 9) form the set of frequent itemsets, L (step 11).
   284   285   286   287   288   289   290   291   292   293   294