Page 238 - Introduction to Statistical Pattern Recognition
P. 238
220 Introduction to Statistical Pattern Recognition
E{&(@,;)) = E(P,P). (5.115)
Thus, combining (5.1 12)-(5.115),
E{&(;,;)} I &(P,P) I E&T{&(;,&)] (5.1 16)
That is, the Bayes error, e(@,@), is bounded by two sample-based estimates
WI.
The rightmost term is obtained by generating two independent
sample sets, F and I&, from P, and using for designing the Bayes classifier
,.
,.
and PT for testing. The expectation of this error with respect to PT gives the
upper bound of the Bayes error. Furthermore, taking the expectation of this
result with respect to does not change this inequality. Therefore,
A,.
}
E >E ;T { &(P, PT) also can be used as the upper bound. This procedure is
called the holdout (H) method. On the other hand, E($,;) ,. is obtained by
,.
using P for designing the Bayes classifier and the same P for testing. The
A
expectation of this error with respect to 0' gives the lower bound of the Bayes
error. This procedure is called resubstitution (R) method.
The holdout method works well if the data sets are generated artificially
by a computer. However, in practice, if only one set of data is available, in
order to apply the holdout method, we need to divide the available sample set
into two independent groups. This reduces the number of samples available
for designing and testing. Also, how to divide samples is a serious and non-
trivial problem. We must implement a proper dividing algorithm to assure that
the distributions of design and test samples are very close. In addition, how to
allocate samples to design and test is another problem. Since we know the
effects of design and test sample sizes from the discussion of Section 5.2, we
may determine how samples should be allocated to design and test by balanc-
ing the resulting bias and variance. As seen in Section 5.2, the bias is deter-
mined by the size of the design samples, while the variance is primarily deter-
mined by the size of the test samples.
A procedure, called the leave-one-out (L) method, alleviates the above
difficulties of the H method [13]. In the L method, one sample is excluded, the
classifier is designed on the remaining N-1 samples, and the excluded sample
is tested by the classifier. This operation is repeated N times to test all N sam-
ples. Then, the number of misclassified samples is counted to obtain the esti-