Page 273 - Introduction to Statistical Pattern Recognition
P. 273

6  Nonparametric Density Estimation                           255



                    chapter is to make the  reader familiar  with  the fundamental  mathematical  pro-
                    perties  related  to  nonparametric  density  estimation  in  preparation  for  the
                    material presented  in Chapter 7.


                    6.1  Parzen Density Estimate
                    Parzen Density Estimate

                         In order to estimate the value of a density  function at a point X, we  may
                    set up a small local region around X, L (X).  Then, the probability coverage (or
                    probability  mass) of  L(X)  may  be  approximated  by  p(X)v  where  v  is  the
                    volume  of  L(X).  This  probability  may  be  estimated  by  drawing  a  large
                    number  of  samples, N, from p(X), counting the  number  of  samples,  k, falling
                    in L (X), and computing k/N.  Equating  these  two probabilities,  we may obtain
                    an estimate of the density function as
                                     n       k(X)       n     k(X)
                                                   or  p(X)= -
                                     p(X)v = -                     .
                                              N                Nv
                    Note  that,  with  a  fixed  v, k  is  a  random  variable  and  is  dependent on  X.  A
                    fixed v  does not imply the same v throughout  the entire space, and vv  could  still
                    vary with X.  However, v is a preset value and is not a random  variable.

                         Kernel  expression:  The  estimate  of  (6.1)  has  another  interpretation.
                    Suppose  that  3  samples, X3, X,,  and X,,  are  found  in L(X) as shown  in  Fig.
                    6-1.  With  I'  and N  given, i(X) becomes  3/Nv. On the other hand, if  we set up
                    a uniform kernel function, IC(.), with  volume  v  and height  l/v around  all exist-
                    ing  samples,  the  average  of  the  values  of  these  kernel  functions  at X  is  also
                    3/Nv. That is, [ 1-41

                                          n      l  N
                                          p(x) = -  K(X  - x;) .                 (6.2)
                                                N  ;=I
                    As  seen  in  Fig.  6-1,  only  the  kernel  functions  around  the  3  samples,
                    X3, X4, and X,, contribute to the summation of (6.2).
                         Once (6.2)  is adopted, the shape of the kernel function could be  selected
                    more  freely,  under  the  condition  K(X) dX  = 1.  For  one-dimensional  cases,
                    we  may  seek  optimality  and  select  a  complex  shape.  However,  in  a  high-
                    dimensional  space, because of  its complexity, the practical  selection of the ker-
   268   269   270   271   272   273   274   275   276   277   278