Page 330 - Computational Statistics Handbook with MATLAB

P. 330

Chapter 9: Statistical Pattern Recognition 319

it belongs to. We build a classifier using these data and (possibly) one of the
techniques that are described in this chapter. To use the classifier, we measure
the four features for an iris of unknown species and use the classifier to assign
the species membership.
Sometimes we are in a situation where we do not know the class member-
ship for our observations. Perhaps we are unable or unwilling to assume how
many groups are represented by the data. In this case, we are in the unsuper-
vised learning mode. To illustrate this, say we have data that comprise mea-
surements of a type of insect called Chaetocnema [Lindsey, Herzberg, and
Watts, 1987; Hand, et al., 1994]. These variables measure the width of the first
joint of the first tarsus, the width of the first joint of the second tarsus, and the
maximal width of the aedegus. All measurements are in microns. We suspect
that there are three species represented by these data. To explore this hypoth-
esis further, we could use one of the unsupervised learning or clustering tech-
niques that will be covered in Section 9.5.

9.2 Bayes Decision Theory
The Bayes approach to pattern classification is a fundamental technique, and
we recommend it as the starting point for most pattern recognition applica-
tions. If this method is not adequate, then more complicated techniques may
be used (e.g., neural networks, classification trees). Bayes decision theory
poses the classification problem in terms of probabilities; therefore, all of the
probabilities must be known or estimated from the data. We will see that this
is an excellent application of the probability density estimation methods from
Chapter 8.
We have already seen an application of Bayes decision theory in Chapter 2.
There we wanted to know the probability that a piston ring came from a par-
ticular manufacturer given that it failed. It makes sense to make the decision
that the part came from the manufacturer that has the highest posterior prob-
ability. To put this in the pattern recognition context, we could think of the
part failing as the feature. The resulting classification would be the manufac-
) that sold us the part. In the following, we will see that
turer (M A or M B
Bayes decision theory is an application of Bayes’ Theorem, where we will
classify observations using the posterior probabilities.
We start off by fixing some notation. Let the class membership be repre-
,
,
, j = 1 … J for a total of J classes. For example, with the iris
sented by ω j
data, we have J = 3 classes:
ω 1 = Iris setosa
ω 2 = Iris versicolor
ω 3 = Iris virginica.
© 2002 by Chapman & Hall/CRC

325 326 327 328 329 330 331 332 333 334 335