Page 355 - Introduction to Statistical Pattern Recognition
P. 355

7  Nonparametric Classification and Error Estimation          337



                                        A; = C; + y;(X-M;)(X-M;)'   ,           (7.65)

                    where y; is a constant to be determined by  solving (7.62).  Substituting (7.63)
                    and (7.65) into (7.62) and simplifying give

                                                                                (7.66)
                      v: (X,M I 1-1 I[ 1+Y Id: (X,M 1 )I  = v; (X,M2 )- 1 1 [ I+y&  (X,M* )I  7
                    where  d;(X,Mi) = (X-M;)'C;I(X-M;).   If  we  could  select  yidf(X,Mj) = -1,
                    (7.66)  is  satisfied.   However,  since  (X-Mi)'A;'  (X-Mi) = d?(X,M,)/
                    [ l+yid?(X,M;)] for A; of  (7.65) from (2.160), yid?(X,Mi) > -1  must be satisfied
                    for A;  to be  positive definite.  A  simple compromise  to overcome this  incon-
                    sistency  is  to  select  a  number  slightly  larger  than  -1  for  yid?(X,Mj). This
                    makes  a1 (X) - a2(X) small, although  not  zero.  This  selection of  the  kernel
                    covariance was tested in the following experiment.


                         Experiment 8: Estimation of the Parzen error, H
                               Data: I-I,1-41, I-A  (Normal, n = 8)
                               Sample size:  N I  = N2 = 100 (Design)
                                           NI = N2 = 1000 (Test)
                               Kernel: Normal, A; of (7.65), yid!  = -0.8
                               Kernel size: I'  = 0.6-2.4
                               Threshold: t  = 0
                               Results: Fig. 7-1 1


                    The optimal kernels given in (7.65) were scaled to satisfy  IAi I = I Cj I, allow-
                    ing  direct  comparison  with  the  results  obtained  using  the  more  conventional
                    kernel A; =E;  (also  shown  in  Fig.  7-1 1).  The  results  for  Data 1-41 and  I-A
                    indicate that although the estimates seem less stable at smaller values of  1', as I'
                    grows the results using (7.65) remain close to the Bayes error while the results
                    using A; =Cj degrade rapidly.  This implies that the I" and r4 terms of  (7.50)
                    and (7.51) have been effectively reduced.  Note that for Data 1-1 (Fig. 7-1 l(a)),
                    the  distributions were  chosen  so  that  aI(X) = a,(X)  on  the  Bayes  decision
                    boundary.  As  a  result  the I" and  r4 terms  of  (7.50)  and  (7.51) are  already
                    zero. and  no improvement  is observed by  changing the kernel.  These experi-
                    mental  results  indicate the  potential  importance  of  the  kernel  covariance  in
                    designing Parzen classifiers.
   350   351   352   353   354   355   356   357   358   359   360