Page 289 - Introduction to Statistical Pattern Recognition
P. 289

6  Nonparametric Density Estimation                           27  I




                                                                                (6.81)

                    Equation  (6.80)  indicates  that  p = (k-I)/Nv  is  unbiased  as  long  as  u =pv
                    holds.  If k/Nv is used instead, the estimate becomes biased.  This is the reason
                    why  (k-1)  is  used  in  (6.68) instead of  k.  The  variance of  i(X) also can  be
                    computed under the approximation of  u = pv as






                                                                                (6.82)


                    Comparison of  (6.29) and  (6.82) shows that  the  variance of  the kNN  density
                    estimate is  larger than the one for the  Parzen  density estimate.  Also,  (6.82)
                    indicates that, in  the kNN  density  estimate, k  must  be  selected larger than  2.
                    Otherwise, a large variance may result.

                         Second order approximation: When  the second order approximation is
                    needed, (6.79) must  be  used  to  relate  u  and  v.  However, since r2 and  1'  are
                    related by  v  = cr", it is difficult to solve (6.79) for v and a series of approxima-
                                         A
                    tions is necessary.  Since p = (k-l)/Nv,  the computation of the first and second
                    order moments of  i(X) requires E { v-'  ]  and E { v-~ 1.  We  start to derive v-l
                    from (6.79) as

                                      v-l  - =p [u-l   + Lc1c-2,n 2/n  -I
                                                   2     vu1
                                                   1
                                         z p [u-1 + --a(cp)-2'""2"-']  ,        (6.83)
                                                   2

                    where the approximation of  u  =pv is applied to the second term to obtain the
                    second line from the  first.  Note that the  second term  was  ignored in  the  first
                    order approximation and therefore is supposed to be much smaller than the first
                    term.  Thus,  using  u =pv to  approximate the  second term  is justified.  From
                    (6.83)
   284   285   286   287   288   289   290   291   292   293   294