Page 301 -
P. 301

2011/6/1
                               13-ch06-243-278-9780123814791
                         HAN
          264   Chapter 6 Mining Frequent Patterns, Associations, and Correlations  3:20 Page 264  #22



                         perform superset checking. This is because if a frequent itemset X ∪ Y is found later
                         than itemset X, and carries the same support as X, it must be in X’s projected database
                         and must have been generated during itemset merging.
                           To assist in subset checking, a compressed pattern-tree can be constructed to main-
                         tain the set of closed itemsets mined so far. The pattern-tree is similar in structure to the
                         FP-tree except that all the closed itemsets found are stored explicitly in the correspond-
                         ing tree branches. For efficient subset checking, we can use the following property: If the
                         current itemset S c can be subsumed by another already found closed itemset S a , then (1) S c
                         and S a have the same support, (2) the length of S c is smaller than that of S a , and (3) all of
                         the items in S c are contained in S a .
                           Based on this property, a two-level hash index structure can be built for fast access-
                         ing of the pattern-tree: The first level uses the identifier of the last item in S c as a hash key
                         (since this identifier must be within the branch of S c ), and the second level uses the sup-
                         port of S c as a hash key (since S c and S a have the same support). This will substantially
                         speed up the subset checking process.
                           This discussion illustrates methods for efficient mining of closed frequent itemsets.
                         “Can we extend these methods for efficient mining of maximal frequent itemsets?” Because
                         maximal frequent itemsets share many similarities with closed frequent itemsets, many
                         of the optimization techniques developed here can be extended to mining maximal
                         frequent itemsets. However, we leave this method as an exercise for interested readers.


                 6.3     Which Patterns Are Interesting?—Pattern

                         Evaluation Methods


                         Most association rule mining algorithms employ a support–confidence framework.
                         Although minimum support and confidence thresholds help weed out or exclude the
                         exploration of a good number of uninteresting rules, many of the rules generated are
                         still not interesting to the users. Unfortunately, this is especially true when mining at
                         low support thresholds or mining for long patterns. This has been a major bottleneck for
                         successful application of association rule mining.
                           In this section, we first look at how even strong association rules can be uninteresting
                         and misleading (Section 6.3.1). We then discuss how the support–confidence frame-
                         work can be supplemented with additional interestingness measures based on correlation
                         analysis (Section 6.3.2). Section 6.3.3 presents additional pattern evaluation measures.
                         It then provides an overall comparison of all the measures discussed here. By the end,
                         you will learn which pattern evaluation measures are most effective for the discovery of
                         only interesting rules.

                   6.3.1 Strong Rules Are Not Necessarily Interesting

                         Whether or not a rule is interesting can be assessed either subjectively or objectively.
                         Ultimately, only the user can judge if a given rule is interesting, and this judgment, being
   296   297   298   299   300   301   302   303   304   305   306