Page 341 - Introduction to Statistical Pattern Recognition
P. 341

7  Nonparametric Classification and Error Estimation          323



                    classification.  One of  the ways to overcome this difficulty is to determine the
                    optimal kernel size experimentally.  Assuming that the kernel function of  (6.3)
                    is adopted with i'  as the size control parameter, we may repeat the estimation of
                    the classification error by  both  L  and R  methods for various values of  I',  and
                    plot the results vs.  I'.  The major drawback of  this approach is that  the estima-
                    tion procedure must be repeated completely for each value of i'.

                         Experiment 4: Estimation of the Parzen errors, L and R
                               Data: I-I, 1-41, I-A  (Normal, n = 8)
                               Sample size: N I  = N2 = 100
                               No. of  trials: z = 10
                               Kernel: Normal with A I  = C  A 2  = C2
                               Kernel size: I' = 0.6-3.0
                               Threshold: t = 0
                               Results: Fig. 7-7 11 21

                     In  Fig. 7-7, the upper and lower bounds of  the  Bayes error were obtained by
                    the L and R methods, respectively.  As seen in Fig. 7-7, the error estimates are
                     very  sensitive to  1', except for the Data I-!  case.  Unless a proper  I'  is chosen,
                     the estimates are heavily biased and do not necessarily bound the Bayes error.
                         In order to understand why  the error estimates behave as in Fig. 7-7 and
                     to  provide  intelligent  guidelines  for  parameter  selection,  we  need  a  more
                     detailed analysis of the Parzen error estimation procedure.

                         Effect of the density estimate:  In general, the likelihood ratio classifier
                     is expressed by


                                                                                (7.43)


                     where t is the threshold.  When the estimates of p  (X) and p2(X) are used,
                                              PIG)
                                     A
                                     h(X) = -In-    -I  = h (X) + Ah(X) ,       (7.44)
                                              P2W
                     where  is the adjusted threshold.  The discriminant function i(X) is a random
                     variable  and  deviates  from  h(X)  by  Ah(X).  The  effect  of  Ah(X)  on  the
                     classification error can be evaluated from (5.65) as
   336   337   338   339   340   341   342   343   344   345   346