Page 341 - Introduction to Statistical Pattern Recognition

P. 341

7 Nonparametric Classification and Error Estimation 323

classification. One of the ways to overcome this difficulty is to determine the
optimal kernel size experimentally. Assuming that the kernel function of (6.3)
is adopted with i' as the size control parameter, we may repeat the estimation of
the classification error by both L and R methods for various values of I', and
plot the results vs. I'. The major drawback of this approach is that the estima-
tion procedure must be repeated completely for each value of i'.

Experiment 4: Estimation of the Parzen errors, L and R
Data: I-I, 1-41, I-A (Normal, n = 8)
Sample size: N I = N2 = 100
No. of trials: z = 10
Kernel: Normal with A I = C A 2 = C2
Kernel size: I' = 0.6-3.0
Threshold: t = 0
Results: Fig. 7-7 11 21

In Fig. 7-7, the upper and lower bounds of the Bayes error were obtained by
the L and R methods, respectively. As seen in Fig. 7-7, the error estimates are
very sensitive to 1', except for the Data I-! case. Unless a proper I' is chosen,
the estimates are heavily biased and do not necessarily bound the Bayes error.
In order to understand why the error estimates behave as in Fig. 7-7 and
to provide intelligent guidelines for parameter selection, we need a more
detailed analysis of the Parzen error estimation procedure.

Effect of the density estimate: In general, the likelihood ratio classifier
is expressed by

(7.43)

where t is the threshold. When the estimates of p (X) and p2(X) are used,
PIG)
A
h(X) = -In- -I = h (X) + Ah(X) , (7.44)
P2W
where is the adjusted threshold. The discriminant function i(X) is a random
variable and deviates from h(X) by Ah(X). The effect of Ah(X) on the
classification error can be evaluated from (5.65) as

336 337 338 339 340 341 342 343 344 345 346