Page 357 - Introduction to Statistical Pattern Recognition

P. 357

7 Nonparametric Classification and Error Estimation 339

Estimated kernel covariance: So far, we have assumed that the class
covariance matrices are known and given. However, is practice, these covari-
ance matrices are unknown and must be estimated from a finite number of
samples. This may lead to an optimistically biased error estimate. For exam-
ple, in Data RADAR, upper and lower bounds of the Bayes error are estimated
by the L and R methods. If 720 samples per class are used in this operation
with the class covariance matrices estimated from 8800 samples per class, the
resulting upper and lower bounds are 17.8% and 16.2% respectively. On the
other hand, if the same 720 samples are used to estimate the covariance
matrices, bounds of 8.4% and 7.1% result. These bounds are further lowered
to 5.2% and 3.8%, when 360 samples per class are used in both the error esti-
mation procedure and the covariance estimation. These results demonstrate
that the upper bound of the Bayes error in the L method may be severely
biased. Thus, the estimate may no longer give the upper bound of the Bayes
error, if the class covariances are estimated from the same data used to form
the error estimates. If possible, then, to avoid this bias, one should estimate
the class covariances using a large number of independent samples. Once the
covariances are estimated accurately, we may use a relatively small sample size
for the nonparametric procedures to produce reliable results. However, if addi-
tional samples for estimation of the covariance matrices are not available, in
order to obtain reliable upper bounds on the Bayes error, one must use leave-
one-out type estimates of the kernel covariances when forming the L error esti-
mate. This implies the use of a different covariance matrix for each sample
being tested.
In order to show how the kernel covariance can be estimated by the L
method, let us study the kernel function of (6.3). In Parzen error estimation,
this kernel function is inserted into (7.2) and (7.3) to test a sample Xi') from
o, in the L method. Using A; = Cj, which is a good choice in many applica-
tions, we need to compute I Cj I and d?(Xi!',Xy)) = (Xil)-X(j))T J E;' (Xi!)-X'j)).
J
When the covariance matrix Cj is not known and needs to be estimated from
,.
X"' , , . . , ,Xb,), Cj is replaced by its estimate, Xi, and subsequently & by
. ,.-I
;y(Xl'),Xy)) = (X~."-Xy))' Xj (Xi.')-X?)). The L type estimate of the kernel
covariance means that, when Xi') is tested, Xi') must be excluded from the
,.
sample set used to estimate 1'. Letting Elk be the resulting estimate, C, and
..
-7 -2 -2
d; now must be replaced by ilk dlk, while & and d2 are kept unchanged.
and
-2
When the sample covariance of (5.9) is used, I ZIP I and dlk can be easily

352 353 354 355 356 357 358 359 360 361 362