Page 118 -
P. 118
4.2 Bayesian Classification 105
reject region, i.e., 9% of the patterns. The misclassifications are now 1 pattern for
class 1 (2%) and 5 patterns for class 2 (lo%), therefore an overall error of 6%.
Rows: Observ. classif.
Cols: Pred. classif.
90.0
Total 90.0 5 0 5 0
Figure 4.24. Classification matrices of two classes of cork stoppers with
prevalences adjusted for the reject region boundaries.
4.2.4 Dimensionality Ratio and Error Estimation
As already pointed out in section 2.7 the dimensionality ratio issue is an essential
one when designing a classifier. An adequately high dimensionality ratio will
guarantee that the designed classifier has reproducible results, i.e., it performs
equally well when presented with new patterns. Looking at the Mahalanobis and
the Bhattacharyya distance formulas, it is clear that they can only increase when
adding more and more features. This would certainly be the case if we had the true
values of the means and the covariances available, which, in practical applications,
we do not.
When using a large number of features, as already pointed out in sections 2.3
and 2.7, we will have numeric troubles in obtaining a good estimate of C-I, given
the finiteness of the training set. Surprising results can then be expected; for
instance, the performance of the classifier can degrade when more features are
added, insteadof improving.
r IROWS: Observ. classif. I I~ows: Observ. classif . 1
Figure 4.25. Classification results of two classes of cork stoppers using: (a) Ten
features; (b) Four features.