Page 59 -
P. 59
HAN 08-ch01-001-038-9780123814791
22 Chapter 1 Introduction 2011/6/1 3:12 Page 22 #22
say, 50% can be considered uninteresting. Rules below the threshold likely reflect noise,
exceptions, or minority cases and are probably of less value.
Other objective interestingness measures include accuracy and coverage for classifica-
tion (IF-THEN) rules. In general terms, accuracy tells us the percentage of data that are
correctly classified by a rule. Coverage is similar to support, in that it tells us the per-
centage of data to which a rule applies. Regarding understandability, we may use simple
objective measures that assess the complexity or length in bits of the patterns mined.
Although objective measures help identify interesting patterns, they are often insuffi-
cient unless combined with subjective measures that reflect a particular user’s needs and
interests. For example, patterns describing the characteristics of customers who shop
frequently at AllElectronics should be interesting to the marketing manager, but may be
of little interest to other analysts studying the same database for patterns on employee
performance. Furthermore, many patterns that are interesting by objective standards
may represent common sense and, therefore, are actually uninteresting.
Subjective interestingness measures are based on user beliefs in the data. These
measures find patterns interesting if the patterns are unexpected (contradicting a user’s
belief) or offer strategic information on which the user can act. In the latter case, such
patterns are referred to as actionable. For example, patterns like “a large earthquake
often follows a cluster of small quakes” may be highly actionable if users can act on the
information to save lives. Patterns that are expected can be interesting if they confirm a
hypothesis that the user wishes to validate or they resemble a user’s hunch.
The second question—“Can a data mining system generate all of the interesting pat-
terns?”—refers to the completeness of a data mining algorithm. It is often unrealistic
and inefficient for data mining systems to generate all possible patterns. Instead, user-
provided constraints and interestingness measures should be used to focus the search.
For some mining tasks, such as association, this is often sufficient to ensure the com-
pleteness of the algorithm. Association rule mining is an example where the use of
constraints and interestingness measures can ensure the completeness of mining. The
methods involved are examined in detail in Chapter 6.
Finally, the third question—“Can a data mining system generate only interesting pat-
terns?”—is an optimization problem in data mining. It is highly desirable for data
mining systems to generate only interesting patterns. This would be efficient for users
and data mining systems because neither would have to search through the patterns gen-
erated to identify the truly interesting ones. Progress has been made in this direction;
however, such optimization remains a challenging issue in data mining.
Measures of pattern interestingness are essential for the efficient discovery of patterns
by target users. Such measures can be used after the data mining step to rank the dis-
covered patterns according to their interestingness, filtering out the uninteresting ones.
More important, such measures can be used to guide and constrain the discovery pro-
cess, improving the search efficiency by pruning away subsets of the pattern space that
do not satisfy prespecified interestingness constraints. Examples of such a constraint-
based mining process are described in Chapter 7 (with respect to pattern discovery) and
Chapter 11 (with respect to clustering).