Page 354 - Introduction to Statistical Pattern Recognition
P. 354

336                        Introduction to Statistical Pattern Recognition


                      densities becomes a quadratic classifier, resulting in the error much higher than
                      the Bayes error.  As the result, the curves of Fig. 7-10 is significantly different
                      from the ones of  Fig. 7-9, indicating that the selection of a proper r for non-
                      normal cases could be more critical than the one for normal cases.  Neverthe-
                      less, the Parzen classification does provide usable bounds on the Bayes error.

                           Selection of the kernel shape: An  alternative way  of  compensating for
                      the biases of  the error estimate is the selection of  the kernel shape.  Equations
                      (7.50) and  (7.51) suggest that, if  the kernel covariances are selected such that
                      a, (X) = a2(X), all  terms  which  are  independent of  the  sample  size  may  be
                      eliminated from  the  bias  expression.  Hence,  we  must  find  positive  definite
                      matrices A I  and A2 such that, from (6.13),


                                                                                  (7.62)

                      In  general, V2pi(X)/pi(X)'s are  hard  to obtain.  However, when  pi(X) is  nor-
                      mal,

                                                                                  (7.63)


                      Therefore, we may obtain a solution of  (7.62) in  terms of these expected vec-
                      tors and covariance matrices.

                           Before going to the general solution of (7.62), let us look at the simplest
                      case, where Z,  = C2 = C and A  = A2 = C.  Using (7.63), (7.62) becomes

                                    (X-M  I )Y (X-M ' ) = (X-M2)Y (X-M2)          (7.64)
                      which  is  satisfied by  the  X's  located on  the  Bayes  boundary.  On the  other
                      hand,  since  the  integration  of  (7.45)  with  respect  to  w  results  in
                                            1
                      J[E(AhJbh)+(1/2)E(Ah2 d6(h)/dhl (P,p,-f *p2)dX, the  bias  is  generated
                      only by  E { Ah(X) I  and E{ Ah2(X) 1  on the boundary.  Therefore, the selection
                      of  A  = A2 = C seems to  be  a  reasonable choice.  Indeed, the  error curve of
                      Fig. 7-7(a) shows little bias for large r, without adjusting the threshold.
                           The general  solution of  (7.62) is  very  hard  to  obtain.  However, since
                       (7.62)  is a scaler equation, there are many possible solutions.  Let us select a
                       solution of the form
   349   350   351   352   353   354   355   356   357   358   359