Page 237 - Introduction to Statistical Pattern Recognition

P. 237

5 Parameter Estimation 219

5.3 Holdout, Leave-One-Out, and Resubstitution Methods

Bounds of the Bayes Error

When a finite number of samples is given and the performance of a
specified classifier is to be estimated, we need to decide which samples are
used for designing the classifier and which samples are for testing the
classifier.

Upper and lower bounds of the Bayes error: In general, the
classification error is a function of two sets of data, the design and test sets,
and may be expressed by

where, F is a set of two densities as
lF = (P I(X), P2(X)I . (5.1 10)
If the classifier is the Bayes for the given test distributions, the resulting error
is minimum. Therefore, we have the following inequality
E(PT,P;7) I E(P'D,PT) . (5.1 11)

The Bayes error for the true I" is E(P,P). However, we never know the true
,.
5). One way to overcome this difficulty is to find upper and lower bounds of
A *
E(P,P) based on its estimate @' = (pI(X),p2(X)}. In order to accomplish this,
let us introduce from (5.1 11) two inequalities as
E(P,F) I P) , (5.1 12)
E(;,
(5.1 13)

1
Equation (5.1 12) indicates that F is the better design set than 0' for testing F.
Likewise, ; better design set than P for testing ;. Also, it is known
is
the
from (5.48) that, if an error counting procedure is adopted, the error estimate is
unbiased with respect to test samples. Therefore, the right-hand side of (5.112)
can be modified to
1 1..
E( 5', F) = E & ( E( 5', PT) } , (5.1 14)
,.
1
where PT is another set generated from D independently of P. Also, after tak-
ing the expectation of (5.1 13), the right-hand side may be replaced by

232 233 234 235 236 237 238 239 240 241 242