Page 291 - Introduction to Statistical Pattern Recognition
P. 291

6  Nonparametric Density Estimation                           273













                    where N >> k >> 1  is  assumed.  Therefore, the  variance and mean-square error
                    of  &x)  are


                                                                                (6.93)



                                                                                (6.94)



                    Again,  in  (6.94) the  first  and  second  terms  are  the  variance and  the  squared
                    bias respectively.  It must be pointed  out that the series of approximations used
                    to obtain (6.91)-(6.94) is valid only for large k.  For small k, different and more
                                              A            A
                    complex approximations  for E { p(X)) and Var( p(X)] must be derived by  using
                    (6.87) and  (6.88) rather  than  (6.90).  As  in  the  Parzen  case, the  second order
                    approximation  for  the  bias  and  the  first  order approximation for  the  variance
                    may be  used for simplicity.  Also, note that the MSE of (6.94) becomes zero as
                    k+-=  and klN+O.  These are the conditions for the kNN density estimate to be
                    asymptotically unbiased and consistent  [ 141.



                    Optimal Number of Neighbors

                         Optimal  k:  In  order  to  apply  the  kNN  density  estimate  of  (6.68), we
                    need  to  know  what  value to select for k.  The optimal k  under  the  approxima-
                    tion  of  14  =PI’ is  m,  by  minimizing  (6.82) with  respect  to  k.  That  is,  when
                    L(X) is  small  and  u  =PI’ holds, the  variance  dominates  the  MSE and  can  be
                    reduced  by  selecting  larger  k  or  larger  L(X).  As  L(X) becomes  larger,  the
                    second  order  term  produces  the  bias  and  the  bias  increases  with  L(X).  The
                    optimal k  is determined by the rate of the variance  decrease and the rate of bias
                    increase.
   286   287   288   289   290   291   292   293   294   295   296