Page 286 -

P. 286

3:20
#7
Page 249
2011/6/1
HAN 13-ch06-243-278-9780123814791
6.2 Frequent Itemset Mining Methods 249

collecting those items that satisfy minimum support. The resulting set is denoted by L 1 .
Next, L 1 is used to ﬁnd L 2 , the set of frequent 2-itemsets, which is used to ﬁnd L 3 , and
so on, until no more frequent k-itemsets can be found. The ﬁnding of each L k requires
one full scan of the database.
To improve the efﬁciency of the level-wise generation of frequent itemsets, an
important property called the Apriori property is used to reduce the search space.

Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
The Apriori property is based on the following observation. By deﬁnition, if an item-
set I does not satisfy the minimum support threshold, min sup, then I is not frequent,
that is, P(I) < min sup. If an item A is added to the itemset I, then the resulting itemset
(i.e., I ∪ A) cannot occur more frequently than I. Therefore, I ∪ A is not frequent either,
that is, P(I ∪ A) < min sup.
This property belongs to a special category of properties called antimonotonicity in
the sense that if a set cannot pass a test, all of its supersets will fail the same test as well. It
is called antimonotonicity because the property is monotonic in the context of failing a
test. 6
“How is the Apriori property used in the algorithm?” To understand this, let us look at
how L k−1 is used to ﬁnd L k for k ≥ 2. A two-step process is followed, consisting of join
and prune actions.
1. The join step: To ﬁnd L k , a set of candidate k-itemsets is generated by joining
L k−1 with itself. This set of candidates is denoted C k . Let l 1 and l 2 be itemsets
in L k−1 . The notation l i [j] refers to the jth item in l i (e.g., l 1 [k − 2] refers to
the second to the last item in l 1 ). For efﬁcient implementation, Apriori assumes
that items within a transaction or itemset are sorted in lexicographic order. For
the (k − 1)-itemset, l i , this means that the items are sorted such that l i [1] < l i [2]
< ··· < l i [k − 1]. The join, L k−1 1 L k−1 , is performed, where members of L k−1 are
joinable if their ﬁrst (k − 2) items are in common. That is, members l 1 and l 2
of L k−1 are joined if (l 1 [1] = l 2 [1]) ∧ (l 1 [2] = l 2 [2]) ∧ ··· ∧ (l 1 [k − 2] = l 2 [k − 2])
∧(l 1 [k − 1] < l 2 [k − 1]). The condition l 1 [k − 1] < l 2 [k − 1] simply ensures that
no duplicates are generated. The resulting itemset formed by joining l 1 and l 2 is
{l 1 [1], l 1 [2],..., l 1 [k − 2], l 1 [k − 1], l 2 [k − 1]}.
2. The prune step: C k is a superset of L k , that is, its members may or may not be
frequent, but all of the frequent k-itemsets are included in C k . A database scan to
determine the count of each candidate in C k would result in the determination of
L k (i.e., all candidates having a count no less than the minimum support count are
frequent by deﬁnition, and therefore belong to L k ). C k , however, can be huge, and so
this could involve heavy computation. To reduce the size of C k , the Apriori property

6 The Apriori property has many applications. For example, it can also be used to prune search during
data cube computation (Chapter 5).

281 282 283 284 285 286 287 288 289 290 291