Page 348 - Introduction to Statistical Pattern Recognition
P. 348

330                         Introduction to Statistical Pattern Recognition


                      density estimate classifies samples from the original normal distributions with
                      an  improper threshold.  Figure  7-7  shows exactly  that.  In Fig.  7-7(a), with
                      C1 = X2 = I, good performance was obtained even for large values of r without
                      adjusting the threshold.  When  lEl I and  IC2 I  are different, as with Data 1-41
                      and  !-A,  the  performance of  the  Parzen  classifier degrades sharply  for larger
                      values of r without adjusting the threshold, as evidenced in Fig. 7-7(b) and (c).
                      Figure 7-9 shows the behavior of  the Parzen classifier for these three data sets
                       with  t given by  (7.56)  (Option  1).  For  low  values  of  1’, the  classifiers give
                       similar performance to that shown in  Fig. 7-7, since the appropriate value of t
                       given in (7.56) is close to zero.  As  I’  increases, good performance is obtained
                       for ail values of  r.  Thus, by allowing the decision threshold to vary with I-, we
                       are able to make the Parzen classifier much less sensitive to the value of r.

                           The threshold for non-normal distributions: The decision threshold as
                       used  here is simply a means of  compensating for the bias  inherent in the den-
                       sity estimation procedure.  When the data and the kernel functions are normal,
                       we have shown that the bias may be completely compensated for by choosing
                       the  value  of  t  given  in  (7.56).  In  the  non-normal  case,  we  cannot  hope  to
                       obtain a decision rule equivalent to the  Bayes classifier simply by  varying 1.
                       However, by  choosing an appropriate value of  t, we can hope to compensate,
                       to  some extent, for the  bias  of  the  density estimates in  a  region  close to  the
                       Bayes  decision  boundary,  providing  significant improvement in  the  perfor-
                       mance of the Parzen classifier.  Therefore, procedures are needed for determin-
                       ing  the  best  value  of  t  to  use  when  non-normal  data  is  encountered.  We
                       present here four possible  options.  These options, and  a  brief  discussion of
                       their motivation, are given below.

                       Option I:  Use  the  threshold  as  calculated  under  the  normality  assumption
                       (7.56).  Since for larger values of  I’  the decision rule is dominated by  the func-
                       tional form of  the kernels, this procedure may  give satisfactory results when
                       the kernels are normal. even if the data is not normal.

                       Option 2:  For each value of  I-, find the value of  t  which minimizes the leave-
                       one-out error, and  find  the  optimal  t  for  the  resubstitution error  separately.
                       This option involves finding and sorting the L and R estimates of  the likelihood
                       ratio, and incrementing the values oft through these sorted lists.  The error rate
                       used  as the  estimate is  the  minimum error rate obtained over all  values of  t.
   343   344   345   346   347   348   349   350   351   352   353