Page 285 - Introduction to Statistical Pattern Recognition
P. 285

6  Nonparametric Density Estimation                           267




                                                                                (6.60)


                    or
                                            n
                                                   P
                                    hi + hk(Chi) = - (k = 1,. . . ,n) .         (6.61)
                                                  o2
                                           i=l
                    In  order to satisfy (6.611, all hi's must  be  equal.  Since  IA I  = 1, the solution
                    of  (6.61) must be

                                                 A=I.                           (6.62)
                    That  is,  in  the  transformed  Z-space,  the  optimal  matrix  A,  is  I  for  B,  = 1.
                    Therefore, the optimal matrix A  to use in the original X-space is identical to B
                    of  (6.51) [5]. The neighborhoods should take the same ellipsoidal shape as the
                    underlying distribution.  For the normal distribution we  see that the covariance
                    matrix B = C is indeed optimal for A.
                         It  is important to  notice that  (6.62) is the locally optimal metric regard-
                    less of  the location, because IMSE* of  (6.54) is minimized not after but before
                    taking the integration.  The same result can be  obtained by  minimizing MSE *
                    of (6.38).



                    Normal Case

                         In order to get an idea of  what kind of  numbers should be  used for I-, in
                    this section let us compute the optimal I' for a normal distribution.  The partial
                    derivatives Vp (X) and V2p (X) for Nx(M, C) are

                              Vp(X) = - p (x)C-'(x-M)  ,                        (6.63)


                                                                   .
                             v2p  (X) = p (X)[C-' (X-M)(X-M)'C-'   - C-']       (6.64)
                    For the simplest case in which M = 0 and I: = I,


                                tr{V'p(X))  =~(x)(xTx - n) =P(x)(&   - n) .     (6.65)
                                                               r=l
                    Note  that  the  optimal  A  is  also  I  in  this  case.  It  is  easy  to  show  that,  if
                    p (X) = Nx(OJ), then p2(X) = 2-"'2(2n)-"'2NX(0,1/2).  Therefore,
   280   281   282   283   284   285   286   287   288   289   290