Page 355 - Introduction to Statistical Pattern Recognition
P. 355
7 Nonparametric Classification and Error Estimation 337
A; = C; + y;(X-M;)(X-M;)' , (7.65)
where y; is a constant to be determined by solving (7.62). Substituting (7.63)
and (7.65) into (7.62) and simplifying give
(7.66)
v: (X,M I 1-1 I[ 1+Y Id: (X,M 1 )I = v; (X,M2 )- 1 1 [ I+y& (X,M* )I 7
where d;(X,Mi) = (X-M;)'C;I(X-M;). If we could select yidf(X,Mj) = -1,
(7.66) is satisfied. However, since (X-Mi)'A;' (X-Mi) = d?(X,M,)/
[ l+yid?(X,M;)] for A; of (7.65) from (2.160), yid?(X,Mi) > -1 must be satisfied
for A; to be positive definite. A simple compromise to overcome this incon-
sistency is to select a number slightly larger than -1 for yid?(X,Mj). This
makes a1 (X) - a2(X) small, although not zero. This selection of the kernel
covariance was tested in the following experiment.
Experiment 8: Estimation of the Parzen error, H
Data: I-I,1-41, I-A (Normal, n = 8)
Sample size: N I = N2 = 100 (Design)
NI = N2 = 1000 (Test)
Kernel: Normal, A; of (7.65), yid! = -0.8
Kernel size: I' = 0.6-2.4
Threshold: t = 0
Results: Fig. 7-1 1
The optimal kernels given in (7.65) were scaled to satisfy IAi I = I Cj I, allow-
ing direct comparison with the results obtained using the more conventional
kernel A; =E; (also shown in Fig. 7-1 1). The results for Data 1-41 and I-A
indicate that although the estimates seem less stable at smaller values of 1', as I'
grows the results using (7.65) remain close to the Bayes error while the results
using A; =Cj degrade rapidly. This implies that the I" and r4 terms of (7.50)
and (7.51) have been effectively reduced. Note that for Data 1-1 (Fig. 7-1 l(a)),
the distributions were chosen so that aI(X) = a,(X) on the Bayes decision
boundary. As a result the I" and r4 terms of (7.50) and (7.51) are already
zero. and no improvement is observed by changing the kernel. These experi-
mental results indicate the potential importance of the kernel covariance in
designing Parzen classifiers.

