Page 273 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 273
254 6 Statistical Classification
In a backward search, the process starts with the whole feature set and, at each
step, the feature that contributes the least to class discrimination is removed. The
process goes on until the merit criterion for any candidate feature is above a
specified threshold.
2. Sequential search (dynamic)
The problem with the previous search methods is the possible existence of “nested”
feature subsets that are not detected by direct sequential search. This problem is
tackled in a dynamic search by performing a combination of forward and backward
searches at each level, known as “plus l-take away r” selection.
Direct sequential search methods can be applied using STATISTICA and SPSS,
the latter affording a dynamic search procedure that is in fact a “plus 1-take away
1” selection. As merit criterion, STATISTICA uses the ANOVA F (for all selected
features at a given step) with default value of one. SPSS allows the use of other
merit criteria such as the squared Bhattacharyya distance (i.e., the squared
Mahalanobis distance of the means).
2
It is also common to set a lower limit to the so-called tolerance level, T = 1 – r ,
which must be satisfied by all features, where r is the multiple correlation factor of
one candidate feature with all the others. Highly correlated features are therefore
removed. One must be quite conservative, however, in the specification of the
tolerance. A value at least as low as 1% is common practice.
Example 6.12
Q: Consider the first two classes of the Cork Stoppers’ dataset. Perform
forward and backward searches on the available 10-feature set, using default values
for the tolerance (0.01) and the ANOVA F (1.0). Evaluate the training set errors of
both solutions.
A: Figure 6.21 shows the summary listing of a forward search for the first two
classes of the cork-stopper data obtained with STATISTICA. Equal priors are
assumed. Note that variable ART, with the highest F, entered in the model in “Step 1 . ”
The Wilk’s lambda, initially 1, decreased to 0.42 due to the contribution of
ART. Next, in Step 2”, the variable with highest F contribution for the model
“
containing ART, enters in the model, decreasing the Wilks’ lambda to 0.4. The
process continues until there are no variables with F contribution higher than 1. In
the listing an approximate F for the model, based on the Wilk’s lambda, is also
indicated. Figure 6.21 shows that the selection process stopped with a highly
significant (p ≈ 0) Wilks’ lambda. The four-feature solution {ART, PRM, NG,
RAAR} corresponds to the classification matrix shown before in Figure 6.14b.
Using a backward search, a solution with only two features (N and PRT) is
obtained. It has the performance presented in Example 6.2. Notice that the
backward search usually needs to start with a very low tolerance value (in the
present case T = 0.002 is sufficient). The dimensionality ratio of this solution is