Page 59 -
P. 59

HAN 08-ch01-001-038-9780123814791


          22    Chapter 1 Introduction                       2011/6/1  3:12  Page 22  #22



                         say, 50% can be considered uninteresting. Rules below the threshold likely reflect noise,
                         exceptions, or minority cases and are probably of less value.
                           Other objective interestingness measures include accuracy and coverage for classifica-
                         tion (IF-THEN) rules. In general terms, accuracy tells us the percentage of data that are
                         correctly classified by a rule. Coverage is similar to support, in that it tells us the per-
                         centage of data to which a rule applies. Regarding understandability, we may use simple
                         objective measures that assess the complexity or length in bits of the patterns mined.
                           Although objective measures help identify interesting patterns, they are often insuffi-
                         cient unless combined with subjective measures that reflect a particular user’s needs and
                         interests. For example, patterns describing the characteristics of customers who shop
                         frequently at AllElectronics should be interesting to the marketing manager, but may be
                         of little interest to other analysts studying the same database for patterns on employee
                         performance. Furthermore, many patterns that are interesting by objective standards
                         may represent common sense and, therefore, are actually uninteresting.
                           Subjective interestingness measures are based on user beliefs in the data. These
                         measures find patterns interesting if the patterns are unexpected (contradicting a user’s
                         belief) or offer strategic information on which the user can act. In the latter case, such
                         patterns are referred to as actionable. For example, patterns like “a large earthquake
                         often follows a cluster of small quakes” may be highly actionable if users can act on the
                         information to save lives. Patterns that are expected can be interesting if they confirm a
                         hypothesis that the user wishes to validate or they resemble a user’s hunch.
                           The second question—“Can a data mining system generate all of the interesting pat-
                         terns?”—refers to the completeness of a data mining algorithm. It is often unrealistic
                         and inefficient for data mining systems to generate all possible patterns. Instead, user-
                         provided constraints and interestingness measures should be used to focus the search.
                         For some mining tasks, such as association, this is often sufficient to ensure the com-
                         pleteness of the algorithm. Association rule mining is an example where the use of
                         constraints and interestingness measures can ensure the completeness of mining. The
                         methods involved are examined in detail in Chapter 6.
                           Finally, the third question—“Can a data mining system generate only interesting pat-
                         terns?”—is an optimization problem in data mining. It is highly desirable for data
                         mining systems to generate only interesting patterns. This would be efficient for users
                         and data mining systems because neither would have to search through the patterns gen-
                         erated to identify the truly interesting ones. Progress has been made in this direction;
                         however, such optimization remains a challenging issue in data mining.
                           Measures of pattern interestingness are essential for the efficient discovery of patterns
                         by target users. Such measures can be used after the data mining step to rank the dis-
                         covered patterns according to their interestingness, filtering out the uninteresting ones.
                         More important, such measures can be used to guide and constrain the discovery pro-
                         cess, improving the search efficiency by pruning away subsets of the pattern space that
                         do not satisfy prespecified interestingness constraints. Examples of such a constraint-
                         based mining process are described in Chapter 7 (with respect to pattern discovery) and
                         Chapter 11 (with respect to clustering).
   54   55   56   57   58   59   60   61   62   63   64