Page 300 -
P. 300

3:20 Page 263
                                                                                    #21
                               13-ch06-243-278-9780123814791
                                                            2011/6/1
                         HAN
                                                               6.2 Frequent Itemset Mining Methods  263


                               can identify the case of closed itemsets during mining. Pruning strategies include the
                               following:
                               Item merging: If every transaction containing a frequent itemset X also contains an itemset
                                Y but not any proper superset of Y, then X ∪ Y forms a frequent closed itemset and there
                                is no need to search for any itemset containing X but no Y.
                                   For example, in Table 6.2 of Example 6.5, the projected conditional database for
                                prefix itemset {I5:2} is {{I2, I1}, {I2, I1, I3}}, from which we can see that each of its
                                transactions contains itemset {I2, I1} but no proper superset of {I2, I1}. Itemset {I2,
                                I1} can be merged with {I5} to form the closed itemset, {I5, I2, I1: 2}, and we do not
                                need to mine for closed itemsets that contain I5 but not {I2, I1}.


                               Sub-itemset pruning: If a frequent itemset X is a proper subset of an already found fre-
                                quent closed itemset Y and support count(X)=support count(Y), then X and all of X’s
                                descendants in the set enumeration tree cannot be frequent closed itemsets and thus can
                                be pruned.
                                   Similar to Example 6.2, suppose a transaction database has only two trans-
                                actions: {ha 1 , a 2 ,..., a 100 i, ha 1 , a 2 ,..., a 50 i}, and the minimum support count is
                                min sup = 2. The projection on the first item, a 1 , derives the frequent itemset, {a 1 ,
                                a 2 ,..., a 50 : 2}, based on the itemset merging optimization. Because support({a 2 }) =
                                support({a 1 , a 2 ,..., a 50 }) = 2, and {a 2 } is a proper subset of {a 1 , a 2 ,..., a 50 }, there
                                is no need to examine a 2 and its projected database. Similar pruning can be done
                                for a 3 ,..., a 50 as well. Thus, the mining of closed frequent itemsets in this data set
                                terminates after mining a 1 ’s projected database.

                               Item skipping: In the depth-first mining of closed itemsets, at each level, there will be
                                a prefix itemset X associated with a header table and a projected database. If a local
                                frequent item p has the same support in several header tables at different levels, we can
                                safely prune p from the header tables at higher levels.
                                   Consider, for example, the previous transaction database having only two trans-
                                actions: {ha 1 , a 2 ,..., a 100 i, ha 1 , a 2 ,..., a 50 i}, where min sup = 2. Because a 2 in a 1 ’s
                                projected database has the same support as a 2 in the global header table, a 2 can be
                                pruned from the global header table. Similar pruning can be done for a 3 ,..., a 50 .
                                There is no need to mine anything more after mining a 1 ’s projected database.
                                 Besides pruning the search space in the closed itemset mining process, another
                               important optimization is to perform efficient checking of each newly derived frequent
                               itemset to see whether it is closed. This is because the mining process cannot ensure that
                               every generated frequent itemset is closed.
                                 When a new frequent itemset is derived, it is necessary to perform two kinds of
                               closure checking: (1) superset checking, which checks if this new frequent itemset is a
                               superset of some already found closed itemsets with the same support, and (2) subset
                               checking, which checks whether the newly found itemset is a subset of an already found
                               closed itemset with the same support.
                                 If we adopt the item merging pruning method under a divide-and-conquer frame-
                               work, then the superset checking is actually built-in and there is no need to explicitly
   295   296   297   298   299   300   301   302   303   304   305