Page 128 -
P. 128

4.3 Model-Free Techniques   115


                          rather  slowly  with  n,  approximately  as  ah2, particularly  for  high  dimensions.
                          Therefore, the number of patterns required to lower the bias significantly may be
                          quite prohibitive, especially for high dimensions.





















                          Figure 4.30.  Bounds on the asymptotic k-NN error rate, Pe(k), as a function of the
                          Bayes error Pe.



                             Another  difficulty  with  the  k-NN  method  is  which  patterns  to  keep  in  the
                           training set as "prototypes" for assessing the neighbourhoods. The edition method
                           has the goal of reducing the number of training set patterns, therefore the number
                           of distances to be computed, and at the same time improving the discrimination, by
                           discarding unreliable  patterns  from the  training  set. The method  consists  of  the
                           following steps:

                           1. First, each pattern in  the training set is classified using  k  neighbours  from the
                             other training set patterns.
                           2. If  the obtained  classification  is  different from  the original  one,  the  pattern  is
                             discarded from  the  training  set.  In  this  way  a  new,  smaller  training  set,  is
                             obtained.
                           3. The test  patterns  are classified  using  the  1-NN rule  and  the  new  training  set
                             derived in step 2.

                             With  the  edition method,  the  patterns  that  are  near  the  boundaries  and  have
                           unreliable  assignments  based  on  the  k-NN  rule  are  eliminated.  The  error
                           probability of  this method was shown, by  Devijver and Kittler (1980), to decrease
                           asymptotically after each successive edition in a suite of editions.
                             The  KNN  program  allows  one  to  perform  k-NN  experiments  for  two-class
                           situations. It  was  used  to  perform  50 runs  of  k-NN  classification  for  the  cork
                            stoppers data, with  features  N  and  PRTIO,  using  the leave-one-out  method  (see
                            section 4.5). An  average error  rate of  9% was obtained,  near the  value reported
                            previously for the normal classifier  and  with  a similar 95% confidence  interval.
   123   124   125   126   127   128   129   130   131   132   133