Page 300 -

P. 300

3:20 Page 263
#21
13-ch06-243-278-9780123814791
2011/6/1
HAN
6.2 Frequent Itemset Mining Methods 263

can identify the case of closed itemsets during mining. Pruning strategies include the
following:
Item merging: If every transaction containing a frequent itemset X also contains an itemset
Y but not any proper superset of Y, then X ∪ Y forms a frequent closed itemset and there
is no need to search for any itemset containing X but no Y.
For example, in Table 6.2 of Example 6.5, the projected conditional database for
preﬁx itemset {I5:2} is {{I2, I1}, {I2, I1, I3}}, from which we can see that each of its
transactions contains itemset {I2, I1} but no proper superset of {I2, I1}. Itemset {I2,
I1} can be merged with {I5} to form the closed itemset, {I5, I2, I1: 2}, and we do not
need to mine for closed itemsets that contain I5 but not {I2, I1}.

Sub-itemset pruning: If a frequent itemset X is a proper subset of an already found fre-
quent closed itemset Y and support count(X)=support count(Y), then X and all of X’s
descendants in the set enumeration tree cannot be frequent closed itemsets and thus can
be pruned.
Similar to Example 6.2, suppose a transaction database has only two trans-
actions: {ha 1 , a 2 ,..., a 100 i, ha 1 , a 2 ,..., a 50 i}, and the minimum support count is
min sup = 2. The projection on the ﬁrst item, a 1 , derives the frequent itemset, {a 1 ,
a 2 ,..., a 50 : 2}, based on the itemset merging optimization. Because support({a 2 }) =
support({a 1 , a 2 ,..., a 50 }) = 2, and {a 2 } is a proper subset of {a 1 , a 2 ,..., a 50 }, there
is no need to examine a 2 and its projected database. Similar pruning can be done
for a 3 ,..., a 50 as well. Thus, the mining of closed frequent itemsets in this data set
terminates after mining a 1 ’s projected database.

Item skipping: In the depth-ﬁrst mining of closed itemsets, at each level, there will be
a preﬁx itemset X associated with a header table and a projected database. If a local
frequent item p has the same support in several header tables at different levels, we can
safely prune p from the header tables at higher levels.
Consider, for example, the previous transaction database having only two trans-
actions: {ha 1 , a 2 ,..., a 100 i, ha 1 , a 2 ,..., a 50 i}, where min sup = 2. Because a 2 in a 1 ’s
projected database has the same support as a 2 in the global header table, a 2 can be
pruned from the global header table. Similar pruning can be done for a 3 ,..., a 50 .
There is no need to mine anything more after mining a 1 ’s projected database.
Besides pruning the search space in the closed itemset mining process, another
important optimization is to perform efﬁcient checking of each newly derived frequent
itemset to see whether it is closed. This is because the mining process cannot ensure that
every generated frequent itemset is closed.
When a new frequent itemset is derived, it is necessary to perform two kinds of
closure checking: (1) superset checking, which checks if this new frequent itemset is a
superset of some already found closed itemsets with the same support, and (2) subset
checking, which checks whether the newly found itemset is a subset of an already found
closed itemset with the same support.
If we adopt the item merging pruning method under a divide-and-conquer frame-
work, then the superset checking is actually built-in and there is no need to explicitly

295 296 297 298 299 300 301 302 303 304 305