Page 121 -
P. 121
108 4 Statistical Classification
Figure 4.26. Two-class linear discriminant E[hd (n)] and E[&, (n)] curves, for
d=7 and S2=3, below and above the dotted line, respectively. The dotted line
represents the Bayes error (0.193).
For precise criteria concerning the deviation of the expected values of Fed (n)
and Fe, (n) from Pe, the magnitude of the standard deviations, and therefore the
95% confidence interval of the estimates, it is advisable to use the PRSize program.
If the patterns are not equally distributed by the classes it is advisable to use the
smaller number of patterns per class as value of n. Notice also that a multi-class
problem with absolute separation of the classes can be seen as a generalization of a
two-class problem (see section 2.1.2). Therefore, the total number of needed
training samples, for a given deviation of the expected error estimates from the
Bayes error can be estimated as cn*, where n* is the particular value of n that
achieves such a deviation in the most unfavourable two-class dichotomy of the
multi-class problem. If a hierarchical approach is followed, one can use the
estimate (c-l)n* instead.
4.3 Model-Free Techniques
The classifiers presented in the previous sections assumed particular shapes of the
pattern clusters and sometimes also particular distributions of the feature vectors.
Briefly, a certain model of the distribution of the feature vectors in the feature
space was assumed. In the present section we will present three important model-
free techniques to design classifiers. These methods do not make any assumptions
about the underlying pattern distributions. They are often called non-parametric
methods, however, at least some of them could be better called semi-parametric.
Although all of these methods are model-free, their tuning to the particular
distributions of the feature vectors is still based on statistical considerations.