Page 280 - Introduction to Statistical Pattern Recognition
P. 280

262                        Introduction to Statistical Pattern Recognition



                                            IMSE = JMSE { P(X) IdX  .             (6.33)

                       Another  possible  criterion  to  obtain  the  globally  optimal  r  is
                       Ex{ MSE { &X)) }  = jMSE{ fi(X))p (X)dX.  The  optimization  of  this  criterion
                       can be carried out in a similar way as the IMSE, and produces a similar but a
                       slightly  smaller  I'  than  the IMSE.  This  criterion places more  weight  on  the
                       MSE in high density areas, where the locally optimal r's tend to be smaller.
                                                                        n
                            Since we  have  computed  the  bias  and  variance of  p(X)  in  (6.18)  and
                       (6.19), MSE{;(X)] may be expressed as

                                                                          .
                                   MSE{~%X)J = [E(P(x)I - p(x)12 + var{i(x)~       (6.34)
                            In  this section, only the uniform kernel function  is considered.  This is
                       because the Parzen density estimate with  the  uniform kernel is more directly
                       related to the k nearest neighbor density estimate, and the comparison of these
                       two is easier.  Since both  normal  and  uniform kernels share similar first and
                       second order moments of  fi(X), the normal kernel function may be treated in
                       the same way as the uniform kernel, and both produce similar results.
                            When the first order approximation is used, &X)  is unbiased as in (6.18),
                       and therefore MSE = Var = pINv - p21N as in  (6.29).  This criterion value is
                       minimized by  selecting v  = m for a given N and p.  That is, as long as the den-
                       sity function is linear in L(X), the variance dominates the MSE of the density
                       estimate, and can be reduced by  selecting larger v.  However, as soon as L(X)
                       is  expanded and  picks  up  the  second order term  of  (6.10), the  bias  starts to
                       appear in  the MSE and  it  grows with  r2 (or v21n) as in  (6.18).  Therefore, in
                       minimizing the MSE, we  select the best compromise between the bias and the
                       variance.  In order to include the effect of  the bias in  our discussion, we  have
                       no choice but  to seiect the second order approximation in  (6.18).  Otherwise,
                       the MSE criterion does not  depend on  the bias term.  On  the other hand, the
                       variance term is included in the MSE no matter which approximation of  (6.19)
                       is used, the first or second order.  If  the second order approximation is used,
                       the  accuracy  of  the  variance  may  be  improved.  However,  the  degree  of
                       improvement may  not  warrant  the  extra  complexity which  the  second  order
                       approximation brings  in.  Furthermore,  it  should  be  remembered  that  the
                       optimal I' will  be  a function of p(X).  Since we  never know the true  value of
   275   276   277   278   279   280   281   282   283   284   285