Page 296 - Introduction to Statistical Pattern Recognition
P. 296

27 8                       Introduction to Statistical Pattern Recognition




                                                                                  (6.1 13)

                            (b)  Uniform:




                       The reader may  check  that  Tl”  2’12  (l-lh~)-~’~ for normal and  (n+2)”’  for
                       uniform  with  m = 1  are  close for  a  large n (10.3  and  10.1 respectively for
                       n = 100).

                            Effect of  parameters: Let us  examine (6.108) and  (6.109) with  m  = 1.
                       These equations reveal how E { dkNN(X)) is affected by  such parameters as n, k,
                       N, and p(X).  The  effect of  k  appears only  in  the  second  term  of  (6.109).
                       When m  = 1 and n is large, T(k+I/n)lT(k) E k““  is close to 1 regardless of the
                       value of  k.  This means that the average distance to the first NN is almost the
                       same as the average distance to the  second NN, and so on.  The effect of N,
                       which  appears  in  the  third  term  of  (6.109),  is  also  minimal,  since
                       T(N+l)lT(N+l+lln)  N-””   1  for  large  n.  The  effect  of  the  location  is
                       observed as p-””(X) in  (6.108).  When n is  large, p-’”’(X) Z 1  regardless of
                       the  value  of  p(X) unless  p(X) is  either  extremely  large  or  small.  Thus,
                       E ( dkNN(X)] is highly influenced only by  n and  I E I.  On the other hand, in the
                       global kNN distance ExE{dkNN(X)), the  I1 I term in v cancels with the  I Z I of
                       Ex{p-””(X)), and  only  n determines the averaged kNN  distance.  This is true
                       because  the  distances  are  normalized  by  C as  in  (6.24).  Table  6-3  shows
                       ExE(dkNN(X)} for  various n, k,  and N for normal  and  uniform  distributions
                       with  covariance matrix  I [18].  The  parameter values  are  n = 10, k = 1,  and
                       N = 100,  unless otherwise indicated.  It  can be  observed from Table 6-3  that
                       the  effects of  k  and N  are not  significant for n = 10.  This behavior is  even
                       more apparent for higher-dimensions.

                            Although the  above results are contrary to our intuition, they  could be
                       better understood by  observing the volume to the kth NN, v~NN, instead of  the
                        distance.  For  example, dzNN/dNN = 2.551l2.319 El.1  and  is  close to  1  for  a
                        10-dimensional normal distribution from Table 6-3.  However, the ratio of the
                        corresponding volumes is V~NNIVNN = (~~NN/~NN)’O E2.6, which  is not close to
                        1.  That  is,  the  effect of  k  on  v~NN is  significant.  The  same is  true  for  the
   291   292   293   294   295   296   297   298   299   300   301