Page 136 -
P. 136
4.4 Feature Selection 123
Imagine that at the start of the search process (level 1 of the tree) we measure
J(xl) and J(x2), corresponding to discarding features x, and x2, respectively, and
obtain the values shown in Figure 4.35. We then proceed on the path below x,
(larger 4, and continue in the same way until level 3, resulting in the solution of
discarding D=(xl, x3, x4), with J(D)=0.768. At this time we have a lower bound for
our search process, B=0.768, therefore we can immediately eliminate the sub-trees
starting at {x2) and at (x,, xZ), from our search process, since J((x2))<B and J((xl,
x2))<B. We still have to consider the sub-tree starting at {xl, x4). We therefore
backtrack to this node and find a better solution, where D=(xl, x4, x5) is discarded,
and we end up with F=(x2, ~3).
The process described here for t=5 and d=2 has an obvious generalization for
any value of t and d. As can be appreciated, this search algorithm avoids the
computational burden of the exhaustive search by neglecting sub-trees that are not
of interest, those that do not satisfy a bound criterion.
Sub-optimal methods
1. Genetic algorithm search
This is a stochastic search in the feature space guided by the idea of inheriting, at
each search step, good properties of parent subsets found in previous steps. This
search method can be combined with the probabilistic neural networks described in
section 4.3.1. We will describe this search method in section 5.8.
2. Sequential search (direct)
The direct sequential search corresponds to following just one path of the complete
search tree.
In a backward search the process starts with the whole feature set and, at each
step, the feature that contributes the least to class discrimination is removed. The
process goes on until the merit criterion for any candidate feature is above a
specified threshold. Backward search is therefore similar to following just one path
of the branch-and-bound tree (ending up with solution (xZ, x5) in Figure 4.35).
In afonvard search one starts with the feature of most merit and, at each step,
all the features not yet included in the subset are revised; the one that contributes
the most to class discrimination is evaluated through the merit criterion. This
feature is then included in the subset and the procedure advances to the next search
step. The process goes on until the merit criterion for any candidate feature is
below a specified threshold.
Direct sequential search is therefore faster than the branch-and-bound algorithm,
but can miss "nested" feature subsets (like {x4, x5} in Figure 4.35).
3. Sequential search (dynamic)
The problem of "nested" feature subsets is tackled in a dynamic search by
performing a combination of forward and backward searches at each level. Also
known as "plus I-take away r" selection, it represents a compromise in terms of
computational effort between the branch and bound method and the direct
sequential search. A version of the "plus I-take away r" selection known as