Page 376 - Introduction to Statistical Pattern Recognition
P. 376

358                         Introduction to Statistical Pattern Recognition



                      Edited Nearest Neighbor
                           Edited kNN:  As  seen in  Fig.  7-3 for the voting kNN  approach, the ai-
                      samples in the a,-region  (i # J)  could become close neighbors of some of the
                      a,-samples,  and  this produces extra errors.  This  is  the  reason  why  the  kNN
                      error is  larger than the Bayes error.  These extra errors could be  reduced by
                      removing ai-samples in the oj-region (samples 1,  3, 5 and 6 of  Fig. 7-3) from
                      the design sample set.  In practice, since the exact location of  the Bayes boun-
                      dary  is  never  known,  the  above  operation  is  replaced  by  removing  the
                      misclassified samples (1,  2, 3, 4, 5 and 6 of  Fig. 7-3) by  the kNN classification
                      [21].  The  resulting  design  set  is  called  the  edited  set.  Test  samples  are
                      classified by  the kNN’s  from the edited set.  This algorithm is called the edited
                      kNN  procedure.  It should be  noted, however, that  some of  wi-samples in  the
                      oj-region are correctly classified and not removed, and some of oj-samples in
                      the ai-region  are misclassified and  removed.  Therefore, the resulting error is
                       not the same as the Bayes error.

                           Asymptotic analysis: In  order to analyze the edited kNN  procedure, let
                       us  study the  simplest case of  the  asymptotic NN.  In  the  original sample set,
                      the populations of aI and o, given X  are represented by  a posteriori probabili-
                       ties q I (X) and q2(X).  Applying the voting NN classification to the oi-samples,
                       q?(X) (qitX)qj(XNN)) are correctly classified, and qi(X)qj(X) (qitX)qitXNN))
                       0’  f i) are misclassified.  Therefore, keeping only correctly classified samples,
                       a posteriori probabilities in the edited set, q;(X), become


                                                                                   (7.84)


                       Samples from the  original  set are classified according to  the class of  the  NN
                       from the edited set.  The resulting error is
   371   372   373   374   375   376   377   378   379   380   381