Page 55 -
P. 55

HAN 08-ch01-001-038-9780123814791


          18    Chapter 1 Introduction                       2011/6/1  3:12  Page 18  #18



                           Typically, association rules are discarded as uninteresting if they do not satisfy both a
                         minimum support threshold and a minimum confidence threshold. Additional anal-
                         ysis can be performed to uncover interesting statistical correlations between associated
                         attribute–value pairs.
                           Frequent itemset mining is a fundamental form of frequent pattern mining. The min-
                         ing of frequent patterns, associations, and correlations is discussed in Chapters 6 and 7,
                         where particular emphasis is placed on efficient algorithms for frequent itemset min-
                         ing. Sequential pattern mining and structured pattern mining are considered advanced
                         topics.

                   1.4.3 Classification and Regression for Predictive Analysis

                         Classification is the process of finding a model (or function) that describes and distin-
                         guishes data classes or concepts. The model are derived based on the analysis of a set of
                         training data (i.e., data objects for which the class labels are known). The model is used
                         to predict the class label of objects for which the the class label is unknown.
                           “How is the derived model presented?” The derived model may be represented in var-
                         ious forms, such as classification rules (i.e., IF-THEN rules), decision trees, mathematical
                         formulae,orneuralnetworks (Figure1.9).Adecisiontreeisaflowchart-liketreestructure,
                         where each node denotes a test on an attribute value, each branch represents an outcome
                         of the test, and tree leaves represent classes or class distributions. Decision trees can easily


                                       age(X, “youth”) AND income(X, “high”)  class(X, “A”)
                                       age(X, “youth”) AND income(X, “low”)  class(X, “B”)
                                       age(X, “middle_aged”)           class(X, “C”)
                                       age(X, “senior”)                class(X, “C”)
                                                           (a)


                                        age?
                                                                           f 3         f 6  class A
                                 youth        middle_aged, senior
                                                           age  f 1
                                                                           f 4         f 7  class B
                                income?         class C
                                                         income  f 2
                            high      low                                  f 5         f 8  class C

                           class A    class B
                                       (b)                                 (c)


               Figure 1.9 A classification model can be represented in various forms: (a) IF-THEN rules, (b) a decision
                         tree, or (c) a neural network.
   50   51   52   53   54   55   56   57   58   59   60