Page 262 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 262

6.3 Bayesian Classification 243

0.5
0.45 Pe
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 2 4 6 8 10 12 14 16 18 δ 2 20
Figure 6.13. Error probability of a Bayesian two-class discrimination with normal
distributions and equal prevalences and covariance.

6.3.3 Dimensionality Ratio and Error Estimation

The Mahalanobis and the Bhattacharyya distances can only increase when adding
more features, since for every added feature a non-negative distance contribution is
also added. This would certainly be the case if we had the true values of the means
and the covariances available, which, in practical applications, we do not.
When using a large number of features we get numeric difficulties in obtaining a
-1
good estimate of Σ , given the finiteness of the training set. Surprising results can
then be expected; for instance, the performance of the classifier can degrade when
more features are added, instead of improving.
Figure 6.14 shows the classification matrix for the two-class, cork-stopper
problem, using the whole ten-feature set and equal prevalences. The training set
performance did not increase significantly compared with the two-feature solution
presented previously, and is worse than the solution using the four-feature vector
[ART PRM NG RAAR]’, as shown in Figure 6.14b.
There are, however, further compelling reasons for not using a large number of
features. In fact, when using estimates of means and covariance derived from a
training set, we are designing a biased classifier, fitted to the training set.
Therefore, we should expect that our training set error estimates are, on average,
optimistic. On the other hand, error estimates obtained in independent test sets are
expected to be, on average, pessimistic. It is only when the number of cases, n, is
sufficiently larger than the number of features, d, that we can expect that our
classifier will generalise, that is it will perform equally well when presented with
new cases. The n/d ratio is called the dimensionality ratio.
The choice of an adequate dimensionality ratio has been studied by several
authors (see References). Here, we present some important results as an aid for the
designer to choose sensible values for the n/d ratio. Later, when we discuss the
topic of classifier evaluation, we will come back to this issue from another
perspective.

257 258 259 260 261 262 263 264 265 266 267