Page 134 -
P. 134
4.4 Feature Selection 12 1
The area under the ROC curve is computed by the SPSS with a 95% confidence
interval. For the FHR-Apgar data these areas are 0.709 2 0.11 and 0.7812 0.10 for
ABLTV and ABSTV, respectively.
Despite some shortcomings, the ROC curve area method is a popular method of
assessing classifier performance. This and an alternative method based on
information theory are described in Metz et al. (1973).
4.4 Feature Selection
As already seen in sections 2.7 and 4.2.3, great care must be exercised in reducing
the number of features used by a classifier, in order to maintain a high
dimensionality ratio and therefore reproducible performance, with error estimates
sufficiently near the theoretical value. For this purpose, several feature assessment
techniques were already explained in chapter 2 with the aim of discarding features
that are clearly non-useful at an initial stage of the PR project.
The feature assessment task, while assuring that an information-carrying feature
set is indeed used in a PR project, does not guarantee that a given classifier needs
the whole set. Consider, for instance, that we are presented with a set of two-
dimensional patterns described by feature vectors consisting of 4 features, x,, x2, x,
and x4, with x3 and x4 being the eigenvectors of the covariance matrix of xl and x;?.
Assuming that the true dimension of the patterns is not known, statistical tests find
that all features contribute to pattern discrimination. However, this discrimination
could be performed equally well using the alternative sets (x,, x2} or (~3, x4}.
Briefly, discarding features with no aptitude for pattern discrimination is no
guarantee against redundant features, and it is, therefore, good practice to attempt
some sort of feature selection.
There is abundant literature on the topic of feature selection. Important
references are included in the bibliography. The most popular methods of feature
selection use a search procedure of a feature subset obeying a stipulated merit
criterion. Let F, be the original set of t features and F be any subset whose
cardinality 14 is the desired dimensionality d, ]1;1 = d. Furthermore, let J(F)
represent the merit criterion used in the selection. The problem of feature selection
is to find a subset F* such that:
J(F*) = max J(F) .
FCF,.~F(=~
A possible choice for J(F) is I-Pe, with the disadvantage that the feature
selection process depends on the chosen type of classifier. More often, a class
separability criterion such as the Bhattacharyya distance or the Anova F statistic is
used.
As for the search method, there is a broad scope of possibilities. In the following
we mention several relevant methods, many of which can be found in available
software products.