Page 23 - Introduction to Statistical Pattern Recognition
P. 23
1 Introduction 5
When no parametric structure can be assumed for the density functions,
we must use nonparametric techniques such as the Parzen and k-nearest neigh-
bor approaches for estimating density functions. In Chapter 6, we develop the
basic statistical properties of these estimates.
Then, in Chapter 7, the nonparametric density estimates are applied to
classification problems. The main topic in Chapter 7 is the estimation of the
Bayes error without assuming any mathematical form for the density functions.
In general, nonparametric techniques are very sensitive to the number of con-
trol parameters, and tend to give heavily biased results unless the values of
these parameters are carefully chosen. Chapter 7 presents an extensive discus-
sion of how to select these parameter values.
In Fig. 1-2, we presented decision-making as dividing a high-
dimensional space. An alternative view is to consider decision-making as a
dictionary search. That is, all past experiences (learning samples) are stored in
a memory (a dictionary), and a test sample is classified to the class of the
closest sample in the dictionary. This process is called the nearest neighbor
classification rule. This process is widely considered as a decision-making
process close to the one of a human being. Figure 1-4 shows an example of
the decision boundary due to this classifier. Again, the classifier divides the
space into two regions, but in a somewhat more complex and sample-
dependent way than the boundary of Fig. 1-2. This is a nonparametric
classifier discussed in Chapter 7.
From the very beginning of the computer age, researchers have been
interested in how a human being learns, for example, to read English charac-
ters. The study of neurons suggested that a single neuron operates like a linear
classifier, and that a combination of many neurons may produce a complex,
piecewise linear boundary. So, researchers came up with the idea of a learning
machine as shown in Fig. 1-5. The structure of the classifier is given along
with a number of unknown parameters wo, . . . ,wT. The input vector, for
example an English character, is fed, one sample at a time, in sequence. A
teacher stands beside the machine, observing both the input and output. When
a discrepancy is observed between the input and output, the teacher notifies the
machine, and the machine changes the parameters according to a predesigned
algorithm. Chapter 8 discusses how to change these parameters and how the
parameters converge to the desired values. However, changing a large number
of parameters by observing one sample at a time turns out to be a very
inefficient way of designing a classifier.