Page 273 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 273

254      6 Statistical Classification


              In a backward search, the process starts with the whole feature set and, at each
           step, the feature that contributes the least to class discrimination is removed. The
           process goes  on  until the merit criterion for any candidate feature is above a
           specified threshold.

           2. Sequential search (dynamic)
           The problem with the previous search methods is the possible existence of “nested”
           feature subsets that are not detected by direct sequential search. This problem is
           tackled in a dynamic search by performing a combination of forward and backward
           searches at each level, known as “plus l-take away r” selection.

              Direct sequential search methods can be applied using STATISTICA and SPSS,
           the latter affording a dynamic search procedure that is in fact a “plus 1-take away
           1” selection. As merit criterion, STATISTICA uses the ANOVA F (for all selected
           features at a given step) with default value of one. SPSS allows the use of other
           merit criteria such as the squared Bhattacharyya distance (i.e., the squared
           Mahalanobis distance of the means).
                                                                             2
              It is also common to set a lower limit to the so-called tolerance level, T = 1 – r ,
           which must be satisfied by all features, where r is the multiple correlation factor of
           one candidate feature with all the others. Highly correlated features are therefore
           removed. One must be  quite conservative, however, in  the specification of the
           tolerance. A value at least as low as 1% is common practice.


           Example 6.12
           Q: Consider the first two classes of the  Cork Stoppers’ dataset. Perform
           forward and backward searches on the available 10-feature set, using default values
           for the tolerance (0.01) and the ANOVA F (1.0). Evaluate the training set errors of
           both solutions.
           A: Figure 6.21 shows the summary listing of a forward search for the first two
           classes of the cork-stopper  data obtained  with STATISTICA. Equal priors are
           assumed. Note that variable ART, with the highest F, entered in the model in  “Step 1 .  ”
           The  Wilk’s  lambda,  initially  1,  decreased  to  0.42  due  to  the  contribution  of
           ART.  Next,  in  Step  2”,  the  variable  with  highest  F contribution for  the model
                        “
           containing ART, enters in the  model, decreasing the Wilks’ lambda to 0.4. The
           process continues until there are no variables with F contribution higher than 1. In
           the listing an approximate F for the model, based on the Wilk’s lambda, is also
           indicated.  Figure 6.21 shows that the selection process stopped with a highly
           significant (p  ≈ 0)  Wilks’ lambda. The four-feature solution {ART, PRM, NG,
           RAAR} corresponds to the classification matrix shown before in Figure 6.14b.
              Using a backward search, a solution  with only two features (N and PRT) is
           obtained.  It has the performance presented in Example 6.2. Notice that the
           backward  search usually needs to  start  with a  very low  tolerance  value (in the
           present case T = 0.002 is sufficient). The dimensionality ratio of this solution is
   268   269   270   271   272   273   274   275   276   277   278