Page 301 -
P. 301
2011/6/1
13-ch06-243-278-9780123814791
HAN
264 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 264 #22
perform superset checking. This is because if a frequent itemset X ∪ Y is found later
than itemset X, and carries the same support as X, it must be in X’s projected database
and must have been generated during itemset merging.
To assist in subset checking, a compressed pattern-tree can be constructed to main-
tain the set of closed itemsets mined so far. The pattern-tree is similar in structure to the
FP-tree except that all the closed itemsets found are stored explicitly in the correspond-
ing tree branches. For efficient subset checking, we can use the following property: If the
current itemset S c can be subsumed by another already found closed itemset S a , then (1) S c
and S a have the same support, (2) the length of S c is smaller than that of S a , and (3) all of
the items in S c are contained in S a .
Based on this property, a two-level hash index structure can be built for fast access-
ing of the pattern-tree: The first level uses the identifier of the last item in S c as a hash key
(since this identifier must be within the branch of S c ), and the second level uses the sup-
port of S c as a hash key (since S c and S a have the same support). This will substantially
speed up the subset checking process.
This discussion illustrates methods for efficient mining of closed frequent itemsets.
“Can we extend these methods for efficient mining of maximal frequent itemsets?” Because
maximal frequent itemsets share many similarities with closed frequent itemsets, many
of the optimization techniques developed here can be extended to mining maximal
frequent itemsets. However, we leave this method as an exercise for interested readers.
6.3 Which Patterns Are Interesting?—Pattern
Evaluation Methods
Most association rule mining algorithms employ a support–confidence framework.
Although minimum support and confidence thresholds help weed out or exclude the
exploration of a good number of uninteresting rules, many of the rules generated are
still not interesting to the users. Unfortunately, this is especially true when mining at
low support thresholds or mining for long patterns. This has been a major bottleneck for
successful application of association rule mining.
In this section, we first look at how even strong association rules can be uninteresting
and misleading (Section 6.3.1). We then discuss how the support–confidence frame-
work can be supplemented with additional interestingness measures based on correlation
analysis (Section 6.3.2). Section 6.3.3 presents additional pattern evaluation measures.
It then provides an overall comparison of all the measures discussed here. By the end,
you will learn which pattern evaluation measures are most effective for the discovery of
only interesting rules.
6.3.1 Strong Rules Are Not Necessarily Interesting
Whether or not a rule is interesting can be assessed either subjectively or objectively.
Ultimately, only the user can judge if a given rule is interesting, and this judgment, being