Page 329 - Computational Statistics Handbook with MATLAB
P. 329
318 Computational Statistics Handbook with MATLAB
evaluating the classifier. In Section 9.4, we illustrate how to construct classi-
fication trees. Section 9.5 contains methods for unsupervised classification or
clustering, including agglomerative methods and k-means clustering.
We first describe the process of statistical pattern recognition in a super-
vised learning setting. With supervised learning, we have cases or observa-
tions where we know which class each case belongs to. Figure 9.1 illustrates
the major steps of statistical pattern recognition.
The first step in pattern recognition is to select features that will be used to
distinguish between the classes. As the reader might suspect, the choice of
features is perhaps the most important part of the process. Building accurate
classifiers is much easier with features that allow one to readily distinguish
between classes.
Once features are selected, we obtain a sample of these features for the dif-
ferent classes. This means that we find objects that belong to the classes of
interest and then measure the features. Each observed set of feature measure-
ments (sometimes also called a case or pattern) has a class label attached to
it. Now that we have data that are known to belong to the different classes,
we can use this information to create the methodology that will take as input
a set of feature measurements and output the class that it belongs to. How
these classifiers are created will be the topic of this chapter.
w
1
Class w
Feature 2
Object Sensor Classification
Extractor Membership .
.
.
w
J
IG
GU
G
II
F F F FI U URE GU 9. RE RE RE 9. 9. 9. 1 1
1
1
This shows a schematic diagram of the major steps for statistical pattern recognition.
One of the main examples we use to illustrate these ideas is one that we
encountered in Chapter 5. In the iris data set, we have three species of iris:
Iris setosa, Iris versicolor and Iris virginica. The data were used by Fisher [1936]
to develop a classifier that would take measurements from a new iris and
determine its species based on the features [Hand, et al., 1994]. The four fea-
tures that are used to distinguish the species of iris are sepal length, sepal
width, petal length and petal width. The next step in the pattern recognition
process is to find many flowers from each species and measure the corre-
sponding sepal length, sepal width, petal length, and petal width. For each
set of measured features, we attach a class label that indicates which species
© 2002 by Chapman & Hall/CRC

