Page 290 -
P. 290

2011/6/1
                                                                     3:20 Page 253
                               13-ch06-243-278-9780123814791
                         HAN
                                                                                    #11
                                                               6.2 Frequent Itemset Mining Methods  253


                               Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based
                                on candidate generation.
                               Input:
                                   D, a database of transactions;
                                   min sup, the minimum support count threshold.
                               Output: L, frequent itemsets in D.
                               Method:
                               (1)  L 1 = find frequent 1-itemsets(D);
                               (2)  for (k = 2;L k−1 6= φ;k++) {
                               (3)    C k = apriori gen(L k−1 );
                               (4)    for each transaction t ∈ D { // scan D for counts
                               (5)       C t = subset(C k , t); // get the subsets of t that are candidates
                               (6)       for each candidate c ∈ C t
                               (7)          c.count++;
                               (8)    }
                               (9)    L k = {c ∈ C k |c.count ≥ min sup}
                               (10)  }
                               (11)  return L = ∪ k L k ;
                               procedure apriori gen(L k−1 :frequent (k − 1)-itemsets)
                               (1)  for each itemset l 1 ∈ L k−1
                               (2)    for each itemset l 2 ∈ L k−1
                               (3)       if (l 1 [1] = l 2 [1]) ∧ (l 1 [2] = l 2 [2])
                                            ∧... ∧ (l 1 [k − 2] = l 2 [k − 2]) ∧ (l 1 [k − 1] < l 2 [k − 1]) then {
                               (4)          c = l 1 1 l 2 ; // join step: generate candidates
                               (5)          if has infrequent subset(c, L k−1 ) then
                               (6)             delete c; // prune step: remove unfruitful candidate
                               (7)          else add c to C k ;
                               (8)       }
                               (9)  return C k ;
                               procedure has infrequent subset(c: candidate k-itemset;
                                      L k−1 : frequent (k − 1)-itemsets); // use prior knowledge
                               (1)  for each (k − 1)-subset s of c
                               (2)    if s 6∈ L k−1 then
                               (3)       return TRUE;
                               (4)  return FALSE;

                     Figure 6.4 Apriori algorithm for discovering frequent itemsets for mining Boolean association rules.



                               A procedure can then be called to generate association rules from the frequent itemsets.
                               Such a procedure is described in Section 6.2.2.
                                 The apriori gen procedure performs two kinds of actions, namely, join and prune, as
                               described before. In the join component, L k−1 is joined with L k−1 to generate potential
                               candidates (steps 1–4). The prune component (steps 5–7) employs the Apriori property
                               to remove candidates that have a subset that is not frequent. The test for infrequent
                               subsets is shown in procedure has infrequent subset.
   285   286   287   288   289   290   291   292   293   294   295