Page 366 - Introduction to Statistical Pattern Recognition
P. 366

348                        Introduction to Statistical Pattern Recognition



                                               1       k          k
                                     E(A&}  3  1  - + b2  (-)un  + b, (-)4'n   ,   (7.74)
                                               k       N         N
                      where the hi's are constant.

                           Several  conclusions  can  be  drawn  from  (7.74).  In  order  to  make
                      E(A&) 4 0,  two  conditions,  k +=  and  k/N + 0  as  N  +=. must  be
                      satisfied.  These conditions are identical to the conditions required for asymp-
                      totic consistency of  the kNN  density estimate  [15].  Since klN Epivi, klN  is
                      proportional  to  r", and  therefore  (klN)""  is  proportional  to  r2.  That  is,
                      E{A&} Zb,/k + b;  r2 + b;  r4 where b; and b; are constants different from b2
                      and h3. We may compare this with  (7.52), E { A&] of the Parzen approach.  In
                      both cases, we have terms with r2 and r4 which are generated by  the bias of
                      the  density estimate.  However, the  bias  of  the  kNN  error does  not  have  an
                      r-"/N  term.  This  is  due to  the  fact that  (7.74) describes the behavior of  the
                      kNN  error only for large values of  k.  A  series of  approximations based on N
                      >> k >> 1 was applied to obtain E (&(X)) and MSE { &X)). If  the analysis of
                      the kNN  error for small values of  k  is needed, more complicated expressions
                      must be used.  In addition to the r2 and r4 terms, the kNN bias has a constant
                      term  bilk, which  the  Parzen bias does not  have.  This may  be  related to the
                      fact that  the voting kNN  error with  a finite k  does not  converge to the  Bayes
                      error, even  under  asymptotic conditions.  This  term  can  be  reduced  only  by
                      increasing k.

                           Effect of parameters: In the Parzen approach, the most effective way to
                      reduce the bias of  the error estimate is to adjust the threshold properly.  Also,
                      in order to assure that the L method gives an  upper bound of  the  Bayes error,
                      the kernel covariance matrix  must  be estimated either from a large number of
                      independent samples or by an L type estimation technique.
                           In  terms  of  threshold  selection, the  method  developed  for  the  Parzen
                      approach may be directly applied to the kNN  approach.  The kNN density esti-
                      mates are known  to be  biased when  the size of  the  design  set is limited, and,
                      by  choosing an appropriate threshold, one might  hope  to  reduce or eliminate
                      the effect of  that  bias  when classification is  performed.  There are no  usable
                      expressions for t  even in  the normal case.  However, each of  the non-normal
                      methods for threshold selection (Options 2, 3, and 4) are directly applicable to
                      the kNN  problem.
   361   362   363   364   365   366   367   368   369   370   371