Page 344 - Introduction to Statistical Pattern Recognition
P. 344

326                        Introduction to Statistical Pattern Recognition



                      normal and uniform kernels respectively.  The reason why  the first and second
                      order approximations are used  for the variance and  bias respectively was dis-
                      cussed  in  Chapter  6.  If  the  second  order  approximation  for  the  variance  is
                      adopted,  we  can  obtain  a  more  accurate  but  complex  expression  for  (7.49).
                      Substituting (7.48) and (7.49) into (7.46) and (7.47),
                                 1           1               I'-" 2N isl i
                                                                  - -
                      E(Ah(X))  S-r2(a2-a1)+ --~'~(a?-a;)-At + -     -            (7.50)
                                 2           8                    PI
                                  1                At
                      E (Ah2(X)) 3,r2(a2-a1  ) -All2  - yr4(a?-a;)
                                  L                 4
                                                                                  (7.5 1)





                      Note that from (6.18) and  (6.19) the  terms associated with r2ai are generated
                      by  the bias of  the density estimate, and the terms associated with  r-"lN  come
                      from the variance.  The threshold adjustment At is a constant selected indepen-
                      dently.
                           Now,  substituting  (7.50) and  (7.51)  into  (7.45)  and  carrying  out  the
                      integration, the bias is expressed in terms of  I' and N as
                                        E(A&} Zu1r2 +a2r4 +a3~-11/N.              (7.52)

                      Here,  the  constants al, a2, and a3 are  obtained  by  evaluating  the  indicated
                      integral expression in (7.45).  Here, we assume, for simplicity, that the decision
                      threshold t is set to zero.  Because of  the complexity of  the expressions, expli-
                      cit evaluation is not possible.  However, the constants are only functions of the
                      distributions and the kernel shapes, A;,  and are completely independent of  the
                      sample size and the smoothing parameter, I'. Hence, (7.52) shows how changes
                      in  I'  and  N affect the  error performance of  the classifier.  The alr2 and a2r4
                      terms indicate how biases in the density estimates influence the performance of
                      the  classifier, while  the  a31'-"lN  term  reflects  the  role  of  the  variance of  the
                      density estimates.  For  small  values  of  I., the  variance term  dominates  (7.52),
                      and  the  observed  error  rates  are  significantly  above  the  Bayes  error.  As  I'
                      grows, however, the  variance  term  decreases while the  u1r2 and  a2v4 terms
                      play  an  increasingly significant role.  Thus, for  a typical plot  of  the  observed
                                      A
                      error rate versus  I',  E decreases for small values of  I' until  a minimum point  is
   339   340   341   342   343   344   345   346   347   348   349