Page 23 - Introduction to Statistical Pattern Recognition
P. 23

1  Introduction                                                5



                          When no parametric structure can be assumed for the density functions,
                     we must use nonparametric techniques such as the Parzen and k-nearest neigh-
                     bor approaches for estimating density functions.  In Chapter 6, we develop the
                     basic statistical properties of these estimates.
                          Then, in  Chapter 7,  the  nonparametric density  estimates are  applied to
                     classification problems.  The main  topic in  Chapter 7  is  the estimation of  the
                     Bayes error without assuming any mathematical form for the density functions.
                     In general, nonparametric techniques are very  sensitive to the number of  con-
                     trol  parameters, and  tend  to  give heavily  biased  results  unless  the  values  of
                     these parameters are carefully chosen.  Chapter 7 presents an extensive discus-
                     sion of how to select these parameter values.
                          In  Fig.  1-2,  we  presented  decision-making  as  dividing  a  high-
                     dimensional space.  An  alternative view  is  to consider decision-making as  a
                     dictionary search.  That is, all past experiences (learning samples) are stored in
                     a  memory  (a  dictionary), and  a  test  sample is  classified  to  the  class  of  the
                     closest  sample in  the  dictionary.  This process  is  called the  nearest  neighbor
                     classification  rule.  This  process  is  widely  considered as  a  decision-making
                     process close to the  one of  a human  being.  Figure  1-4 shows an example of
                     the  decision boundary due to this classifier.  Again, the  classifier divides the
                     space  into  two  regions,  but  in  a  somewhat  more  complex  and  sample-
                     dependent  way  than  the  boundary  of  Fig.  1-2.  This  is  a  nonparametric
                     classifier discussed in Chapter 7.

                          From  the  very  beginning  of  the  computer age,  researchers have  been
                     interested in how  a human being learns, for example, to read  English charac-
                     ters.  The study of  neurons suggested that a single neuron operates like a linear
                     classifier, and  that  a  combination of  many  neurons  may  produce  a  complex,
                     piecewise linear boundary.  So, researchers came up with the idea of a learning
                     machine as shown  in  Fig.  1-5.  The  structure of  the  classifier is  given  along
                     with  a  number  of  unknown  parameters wo, . . . ,wT. The  input  vector,  for
                     example an  English character, is  fed, one  sample at  a  time,  in  sequence.  A
                     teacher stands beside the machine, observing both the input and output.  When
                     a discrepancy is observed between the input and output, the teacher notifies the
                     machine, and the machine changes the parameters according to a predesigned
                     algorithm.  Chapter 8 discusses how  to change these parameters and how  the
                     parameters converge to the desired values.  However, changing a large number
                     of  parameters  by  observing  one  sample  at  a  time  turns  out  to  be  a  very
                     inefficient way of designing a classifier.
   18   19   20   21   22   23   24   25   26   27   28