Page 136 -
P. 136

4.4 Feature Selection   123


                             Imagine that at the start of  the search process (level  1  of  the tree) we measure
                           J(xl) and J(x2), corresponding  to  discarding features x,  and x2, respectively, and
                           obtain the  values  shown  in  Figure 4.35. We  then  proceed on  the path  below x,
                           (larger 4, and continue in  the same way until level 3, resulting in  the solution of
                           discarding D=(xl, x3, x4), with J(D)=0.768. At this time we have a lower bound for
                           our search process, B=0.768, therefore we can immediately eliminate the sub-trees
                           starting at  {x2) and at (x,, xZ), from our search process, since J((x2))<B and J((xl,
                           x2))<B. We still  have  to  consider  the  sub-tree  starting at  {xl,  x4). We  therefore
                           backtrack to this node and find a better solution, where D=(xl, x4, x5) is discarded,
                           and we end up with F=(x2,  ~3).
                             The process described here for t=5  and  d=2 has  an  obvious generalization  for
                           any  value  of  t  and  d.  As  can  be  appreciated,  this  search  algorithm  avoids  the
                           computational burden of the exhaustive search by  neglecting sub-trees that are not
                            of interest, those that do not satisfy a bound criterion.

                            Sub-optimal methods

                            1. Genetic algorithm search
                            This is a stochastic search in  the feature space guided by  the idea of  inheriting, at
                            each  search step, good properties of  parent subsets found in  previous steps. This
                            search method can be combined with the probabilistic neural networks described in
                            section 4.3.1. We will describe this search method in section 5.8.

                            2. Sequential search (direct)
                            The direct sequential search corresponds to following just one path of the complete
                            search tree.
                              In a backward search the process starts with the whole feature set and, at each
                            step, the feature that contributes the least to class discrimination is removed. The
                            process  goes  on  until  the  merit  criterion  for  any  candidate  feature  is  above  a
                            specified threshold. Backward search is therefore similar to following just one path
                            of the branch-and-bound tree (ending up with solution (xZ, x5) in Figure 4.35).
                              In  afonvard  search one starts with the feature of  most merit and, at each step,
                            all the features not yet included in the subset are revised; the one that contributes
                            the  most  to  class  discrimination  is  evaluated  through  the  merit  criterion.  This
                            feature is then included in the subset and the procedure advances to the next search
                            step. The  process  goes  on  until  the  merit  criterion  for  any  candidate  feature  is
                            below a specified threshold.
                              Direct sequential search is therefore faster than the branch-and-bound algorithm,
                             but can miss "nested" feature subsets (like {x4, x5} in Figure 4.35).

                             3. Sequential search (dynamic)
                             The  problem  of  "nested"  feature  subsets  is  tackled  in  a  dynamic  search  by
                             performing a combination of  forward and backward searches at each level. Also
                             known  as "plus I-take away r" selection, it represents a compromise in  terms of
                             computational  effort  between  the  branch  and  bound  method  and  the  direct
                             sequential  search.  A  version  of  the  "plus  I-take  away  r"  selection  known  as
   131   132   133   134   135   136   137   138   139   140   141