Page 272 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 272

6.5 Feature Selection 253

A: The ROC curves for ALTV and ASTV are shown in Figure 6.20. The areas
under the ROC curve, computed by SPSS with a 95% confidence interval, are
0.709 ± 0.11 and 0.781 ± 0.10 for ALTV and ASTV, respectively. We, therefore,
select the ASTV parameter as the best diagnostic feature.

6.5 Feature Selection

As already discussed in section 6.3.3, great care must be exercised in reducing the
number of features used by a classifier, in order to maintain a high dimensionality
ratio and, therefore, reproducible performance, with error estimates sufficiently
near the theoretical value. For this purpose, one may use the hypothesis test
methods described in chapters 4 and 5 with the aim of discarding features that are
clearly non-useful at an initial stage of the classifier design. This feature
assessment task, while assuring that an information-carrying feature set is indeed
used in the classifier, does not guarantee it will need the whole set. Consider, for
instance, that we are presented with a classification problem described by 4
features, x 1, x 2, x 3 and x 4, with x 1 and x 2 perfectly discriminating the classes, and x 3
and x 4 being linearly dependent of x 1 and x 2. The hypothesis tests will then find that
all features contribute to class discrimination. However, this discrimination could
be performed equally well using the alternative sets {x 1, x 2} or {x 3, x 4}. Briefly,
discarding features with no aptitude for class discrimination is no guarantee against
redundant features.
There is abundant literature on the topic of feature selection (see References).
Feature selection uses a search procedure of a feature subset (model) obeying a
stipulated merit criterion. A possible choice for this criterion is minimising Pe,
with the disadvantage of the search process depending on the classifier type. More
often, a class separability criterion such as the Bhattacharyya distance or the
ANOVA F statistic is used. The Wilks’ lambda, defined as the ratio of the
determinant of the pooled covariance over the determinant of the total covariance,
is also a popular criterion. Physically, it can be interpreted as the ratio between the
average class volume and the total volume of all cases. Its value will range from 0
(complete class separation) to 1 (complete class fusion).
As for the search method, the following are popular ones and available in
STATISTICA and SPSS:

1. Sequential search (direct)
The direct sequential search corresponds to performing successive feature
additions or eliminations to the target set, based on a separability criterion.
In a forward search, one starts with the feature of most merit and, at each step,
all the features not yet included in the subset are revised; the one that contributes
the most to class discrimination is evaluated through the merit criterion. This
feature is then included in the subset and the procedure advances to the next search
step. The process goes on until the merit criterion for any candidate feature is
below a specified threshold.

267 268 269 270 271 272 273 274 275 276 277