Page 348 - Introduction to Statistical Pattern Recognition
P. 348
330 Introduction to Statistical Pattern Recognition
density estimate classifies samples from the original normal distributions with
an improper threshold. Figure 7-7 shows exactly that. In Fig. 7-7(a), with
C1 = X2 = I, good performance was obtained even for large values of r without
adjusting the threshold. When lEl I and IC2 I are different, as with Data 1-41
and !-A, the performance of the Parzen classifier degrades sharply for larger
values of r without adjusting the threshold, as evidenced in Fig. 7-7(b) and (c).
Figure 7-9 shows the behavior of the Parzen classifier for these three data sets
with t given by (7.56) (Option 1). For low values of 1’, the classifiers give
similar performance to that shown in Fig. 7-7, since the appropriate value of t
given in (7.56) is close to zero. As I’ increases, good performance is obtained
for ail values of r. Thus, by allowing the decision threshold to vary with I-, we
are able to make the Parzen classifier much less sensitive to the value of r.
The threshold for non-normal distributions: The decision threshold as
used here is simply a means of compensating for the bias inherent in the den-
sity estimation procedure. When the data and the kernel functions are normal,
we have shown that the bias may be completely compensated for by choosing
the value of t given in (7.56). In the non-normal case, we cannot hope to
obtain a decision rule equivalent to the Bayes classifier simply by varying 1.
However, by choosing an appropriate value of t, we can hope to compensate,
to some extent, for the bias of the density estimates in a region close to the
Bayes decision boundary, providing significant improvement in the perfor-
mance of the Parzen classifier. Therefore, procedures are needed for determin-
ing the best value of t to use when non-normal data is encountered. We
present here four possible options. These options, and a brief discussion of
their motivation, are given below.
Option I: Use the threshold as calculated under the normality assumption
(7.56). Since for larger values of I’ the decision rule is dominated by the func-
tional form of the kernels, this procedure may give satisfactory results when
the kernels are normal. even if the data is not normal.
Option 2: For each value of I-, find the value of t which minimizes the leave-
one-out error, and find the optimal t for the resubstitution error separately.
This option involves finding and sorting the L and R estimates of the likelihood
ratio, and incrementing the values oft through these sorted lists. The error rate
used as the estimate is the minimum error rate obtained over all values of t.

