Page 120 -
P. 120
4.2 Bavesian Classification 107
Formula (4-30) allows the computation of confidence interval estimates for Pe ,
by substituting Pe in place of Pe and using the normal distribution approximation
for sufficiently large n (say, n 2 25). Notice that they are zero for the extreme cases
of Pe=O or Pe=l. Furthermore, as this formula is independent of the classifier
model, its value is to be considered a worst-case value, yielding in many
circumstances unrealistically large intervals.
In normal practice we compute Fed by designing and evaluating the classifier in
the same set with n patterns, Fed (n). This error estimate is related to an empirical
risk, mentioned already in section 4.2.1. As for Pet, we may compute it using an
independent set of n patterns, Pet (n). In order to have some guidance on how to
choose an appropriate dimensionality ratio, we would like to know the deviation of
the expected values of these estimates from the Bayes error, where the expectation
is computed on a population of classifiers of the same type and trained in the same
conditions. Formulas for these expectations, E[Ped (n)] and E[Fe, (n)], are quite
intricate and can only be computed numerically. Like formula (4-25), they depend
on the Bhattacharyya distance. The bibliography section includes references where
these formulas for two classes with normal distributions can be found, namely
Foley (1972) and Raudys and Pikelis (1980). A software tool, PR Size, computing
these formulas for the linear discriminant case is included in the CD distributed
with the book. PR Size also allows the computation of confidence intervals of these
estimates, using (4-30).
Figure 4.26 is obtained with PRSize and illustrates how the expected values of
the error estimates evolve with n patterns (assumed here to be the number of
patterns in each class), in the situation of equal covariance. Both curves have an
asymptotic behaviour with n + ooh, with the average design set error estimate
converging to the Bayes error (related to the optimal risk) from below and the
average test set error estimate converging from above.
Both standard deviations, which can be inspected in text boxes for a selected
value of n/d, are initially high for low values of n and converge slowly to zero with
n + 00. For the situation shown in Figure 4.26, the standard deviation of Fed (n)
changes from 0.089 for n=d (14 patterns, 7 per class) to 0.033 for n=lOd (140
patterns, 70 per class).
Based on the behaviour of the E[ Fed (n)] and E[ Pee, (n)] curves some criteria
can be established for the dimensionality ratio. As a general rule of thumb, using
dimensionality ratios above 3 is recommended.
6
Numerical approximations in the computation of the average test set error may result in a
deviation from this asymptotic behaviour, for sufficiently large n.