Page 340 - Introduction to Statistical Pattern Recognition
P. 340

322                        Introduction to Statistical Pattern Recognition


                          Multiclass: The NN  error  for multiclass  problems  can  also be  obtained
                     in a similar way, starting from (7.21) [8].  The result is
                                               +
                                   E(&..,]  :&iN PIEx( IA I-""tr(ABL(X))J ,      (7.41)
                     where

                                                                                 (7.42)

                     Note that PI of  (7.41)  is the  same as PI of  (7.37).  This means that  the  effect
                     of  sample size on the bias does not depend on the number of  classes.

                     7.4  Error Estimation

                          In  this  section,  we  return  to  nonparametric  density  estimates,  and  use
                     these estimates to design a classifier and estimate the classification error.  Both
                     the  Parzen  and  volumetric  kNN  approaches  will  be  discussed.  However,
                     because the analysis of  the Parzen approach  is simpler than the kNN approach,
                     the Parzen approach will be  presented  first  with  detailed  analysis,  and then  the
                     kNN approach will be discussed through comparison  with the Parzen approach.
                          Classification  and  error  estimation  using  the  Parzen  density  estimate
                      were  discussed  in  Section  7.1.  However,  in  order  to  effectively  apply  this
                      technique to practical  problems,  we need  to know how  to determine  the neces-
                      sary parameter  values,  such  as the  kernel  size, kernel  shape, sample  size, and
                      threshold.

                      Effect of the Kernel Size in the Parzen Approach

                           As  we  discussed  the  optimal  volume  of  the  Parzen  density  estimate  in
                      Chapter 6, let us consider the  problem  of  selecting  the kernel  size here.  How-
                      ever,  density  estimation  and  classification  are  different  tasks,  and  the  optimal
                      solution  for  one  might  not  be  optimal  for the  other.  For example, in  density
                      estimation,  the  mean-square  error  criterion  was  used  to  find  the  optimal
                      volume.  This criterion tends to weight the high  density  area more heavily than
                      the  low  density  area.  On  the  other  hand,  in  classification,  the  relationship
                      between  the  tails  of  two densities  is  important.  In  this  case, the  mean-square
                      error  may  not  be  an  appropriate  criterion.  Despite  significant  efforts  in  the
                      past,  it  is  still  unclear  how  to  optimize  the  size  of  the  kernel  function  for
   335   336   337   338   339   340   341   342   343   344   345