Page 290 -
P. 290
2011/6/1
3:20 Page 253
13-ch06-243-278-9780123814791
HAN
#11
6.2 Frequent Itemset Mining Methods 253
Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based
on candidate generation.
Input:
D, a database of transactions;
min sup, the minimum support count threshold.
Output: L, frequent itemsets in D.
Method:
(1) L 1 = find frequent 1-itemsets(D);
(2) for (k = 2;L k−1 6= φ;k++) {
(3) C k = apriori gen(L k−1 );
(4) for each transaction t ∈ D { // scan D for counts
(5) C t = subset(C k , t); // get the subsets of t that are candidates
(6) for each candidate c ∈ C t
(7) c.count++;
(8) }
(9) L k = {c ∈ C k |c.count ≥ min sup}
(10) }
(11) return L = ∪ k L k ;
procedure apriori gen(L k−1 :frequent (k − 1)-itemsets)
(1) for each itemset l 1 ∈ L k−1
(2) for each itemset l 2 ∈ L k−1
(3) if (l 1 [1] = l 2 [1]) ∧ (l 1 [2] = l 2 [2])
∧... ∧ (l 1 [k − 2] = l 2 [k − 2]) ∧ (l 1 [k − 1] < l 2 [k − 1]) then {
(4) c = l 1 1 l 2 ; // join step: generate candidates
(5) if has infrequent subset(c, L k−1 ) then
(6) delete c; // prune step: remove unfruitful candidate
(7) else add c to C k ;
(8) }
(9) return C k ;
procedure has infrequent subset(c: candidate k-itemset;
L k−1 : frequent (k − 1)-itemsets); // use prior knowledge
(1) for each (k − 1)-subset s of c
(2) if s 6∈ L k−1 then
(3) return TRUE;
(4) return FALSE;
Figure 6.4 Apriori algorithm for discovering frequent itemsets for mining Boolean association rules.
A procedure can then be called to generate association rules from the frequent itemsets.
Such a procedure is described in Section 6.2.2.
The apriori gen procedure performs two kinds of actions, namely, join and prune, as
described before. In the join component, L k−1 is joined with L k−1 to generate potential
candidates (steps 1–4). The prune component (steps 5–7) employs the Apriori property
to remove candidates that have a subset that is not frequent. The test for infrequent
subsets is shown in procedure has infrequent subset.