Page 329 - Computational Statistics Handbook with MATLAB
P. 329

318                        Computational Statistics Handbook with MATLAB


                             evaluating the classifier. In Section 9.4, we illustrate how to construct classi-
                             fication trees. Section 9.5 contains methods for unsupervised classification or
                             clustering, including agglomerative methods and k-means clustering.
                              We first describe the process of statistical pattern recognition in a super-
                             vised learning setting. With supervised learning, we have cases or observa-
                             tions where we know which class each case belongs to. Figure 9.1 illustrates
                             the major steps of statistical pattern recognition.
                              The first step in pattern recognition is to select features that will be used to
                             distinguish between the classes. As the reader might suspect, the choice of
                             features is perhaps the most important part of the process. Building accurate
                             classifiers is much easier with features that allow one to readily distinguish
                             between classes.
                              Once features are selected, we obtain a sample of these features for the dif-
                             ferent classes. This means that we find objects that belong to the classes of
                             interest and then measure the features. Each observed set of feature measure-
                             ments (sometimes also called a case or pattern) has a class label attached to
                             it. Now that we have data that are known to belong to the different classes,
                             we can use this information to create the methodology that will take as input
                             a set of feature measurements and output the class that it belongs to. How
                             these classifiers are created will be the topic of this chapter.





                                                                                       w
                                                                                        1
                                                                                 Class  w
                                                         Feature                        2
                                 Object     Sensor                  Classification
                                                         Extractor              Membership  .
                                                                                       .
                                                                                       .
                                                                                       w
                                                                                        J

                               IG
                               GU
                               G
                               II
                              F F F FI  U URE GU 9.  RE RE RE 9. 9. 9. 1  1
                                     1
                                     1
                              This shows a schematic diagram of the major steps for statistical pattern recognition.
                              One of the main examples we use to illustrate these ideas is one that we
                             encountered in Chapter 5. In the iris data set, we have three species of iris:
                             Iris setosa, Iris versicolor and Iris virginica. The data were used by Fisher [1936]
                             to develop a classifier that would take measurements from a new iris and
                             determine its species based on the features [Hand, et al., 1994]. The four fea-
                             tures that are used to distinguish the species of iris are sepal length, sepal
                             width, petal length and petal width. The next step in the pattern recognition
                             process is to find many flowers from each species and measure the corre-
                             sponding sepal length, sepal width, petal length, and petal width. For each
                             set of measured features, we attach a class label that indicates which species

                            © 2002 by Chapman & Hall/CRC
   324   325   326   327   328   329   330   331   332   333   334