Page 59 -

P. 59

HAN 08-ch01-001-038-9780123814791

22 Chapter 1 Introduction 2011/6/1 3:12 Page 22 #22

say, 50% can be considered uninteresting. Rules below the threshold likely reﬂect noise,
exceptions, or minority cases and are probably of less value.
Other objective interestingness measures include accuracy and coverage for classiﬁca-
tion (IF-THEN) rules. In general terms, accuracy tells us the percentage of data that are
correctly classiﬁed by a rule. Coverage is similar to support, in that it tells us the per-
centage of data to which a rule applies. Regarding understandability, we may use simple
objective measures that assess the complexity or length in bits of the patterns mined.
Although objective measures help identify interesting patterns, they are often insufﬁ-
cient unless combined with subjective measures that reﬂect a particular user’s needs and
interests. For example, patterns describing the characteristics of customers who shop
frequently at AllElectronics should be interesting to the marketing manager, but may be
of little interest to other analysts studying the same database for patterns on employee
performance. Furthermore, many patterns that are interesting by objective standards
may represent common sense and, therefore, are actually uninteresting.
Subjective interestingness measures are based on user beliefs in the data. These
measures ﬁnd patterns interesting if the patterns are unexpected (contradicting a user’s
belief) or offer strategic information on which the user can act. In the latter case, such
patterns are referred to as actionable. For example, patterns like “a large earthquake
often follows a cluster of small quakes” may be highly actionable if users can act on the
information to save lives. Patterns that are expected can be interesting if they conﬁrm a
hypothesis that the user wishes to validate or they resemble a user’s hunch.
The second question—“Can a data mining system generate all of the interesting pat-
terns?”—refers to the completeness of a data mining algorithm. It is often unrealistic
and inefﬁcient for data mining systems to generate all possible patterns. Instead, user-
provided constraints and interestingness measures should be used to focus the search.
For some mining tasks, such as association, this is often sufﬁcient to ensure the com-
pleteness of the algorithm. Association rule mining is an example where the use of
constraints and interestingness measures can ensure the completeness of mining. The
methods involved are examined in detail in Chapter 6.
Finally, the third question—“Can a data mining system generate only interesting pat-
terns?”—is an optimization problem in data mining. It is highly desirable for data
mining systems to generate only interesting patterns. This would be efﬁcient for users
and data mining systems because neither would have to search through the patterns gen-
erated to identify the truly interesting ones. Progress has been made in this direction;
however, such optimization remains a challenging issue in data mining.
Measures of pattern interestingness are essential for the efﬁcient discovery of patterns
by target users. Such measures can be used after the data mining step to rank the dis-
covered patterns according to their interestingness, ﬁltering out the uninteresting ones.
More important, such measures can be used to guide and constrain the discovery pro-
cess, improving the search efﬁciency by pruning away subsets of the pattern space that
do not satisfy prespeciﬁed interestingness constraints. Examples of such a constraint-
based mining process are described in Chapter 7 (with respect to pattern discovery) and
Chapter 11 (with respect to clustering).

54 55 56 57 58 59 60 61 62 63 64