Page 127 -
P. 127

114    4 Statistical Classification






                                If  there are few points  around x,  the  region  grows large and  the  volume  V(n)
                              will also be large, therefore  yielding a low density value.  If there are many points
                              around x, in which case x is a high density point, the volume  V(n) will be small.
                                The smoothing parameter in this case is k(n) which must grow with n. Fukunaga
                              and Hostetler (1973) derived an expression for k(n), which for normal distributions
                              is kon4"d*'. The constant ko, initially  1.0 for d=l, decreases with d, to 0.75 for d=4
                              and to 0.42 for d=32. For instance, the two-class cork stoppers classification  with
                              100 patterns  would  need  about 7 neighbours  when  using  4 features. As  we have
                              done with the h(n) factor in Parzen window estimation, the k(n) factor also has to
                              be adjusted experimentally.
                                When used as apdf  estimator the k-nearest neighbours method, or k-NN method
                              for short, has the folowing shortcomings: the divergence to infinite of the integral
                              of  the density estimate and  the computational burden  of  the  method. Rather than
                              using  this  method  for pdf  estimation,  we  will  use  it  simply  as  a  classification
                              method, according to the following nearest neighbour rule: consider the k(n) points
                              that are nearest neighbours of x, using a certain distance metric; the classification
                              of x is then the class label that is found in majority among the k(n) neighbours. The
                              distance metric used results in different decision boundaries.
                                When  applying the k-NN  method,  we are interested  in  knowing  its  attainable
                              performance in terms of its error rate Pe(k) for an arbitrarily large population,  i.e.,
                              for  n + m , compared  with  the  optimal  Bayes  error  Pe.  Bounds  on  Pe(k) are
                              studied  and  presented  in  Duda  and  Hart  (1973) and  can  be  expressed,  for two
                              classes and odd k(n), as:







                                 In this formula Pe is the Bayes error. In particular, for the  1-NN case, the error
                              has the following upper bound:





                                These bounds are shown in Figure 4.30.  As can be seen, Pe(k) approaches Pe
                              for increasing values  of k, also demanding large numbers of  points,  and  is never
                              worse  than  2Pe.  Even  in  the  case  of  k=l,  the  nearest  neighbour  rule,  the
                              performance  of  the  technique  is quite  remarkable.  Bear  in  mind,  however,  that
                              Pe(k) is the asymptotic value of the overall error rate when the number of available
                              points  grows  towards  infinity.  A  training  set  estimate  of  Pe(k)  can  have  a
                              significant bias relative to this asymptotic value. This issue is discussed in detail in
                              Fukunaga and Hummels (1987) where the effects of dimensionality  and metric are
                              presented.  An  important  conclusion  of  this  study  is  that  the  bias  tends  to decay
   122   123   124   125   126   127   128   129   130   131   132