Page 143 -
P. 143

130    4 Statistical Classification


                             4.6  Tree Classifiers



                             4.6.1  Decision Trees and Tables

                             In  multi-class  classification  tasks one is  often  confronted  with  the  problem  that
                             reasonable performances can only be achieved through the use of  a large number
                             of features. This requires a very large design set for proper training, probably much
                             larger  than  what  we  have  available.  Also,  the  feature  subset  that  is  the  most
                             discriminating set for some classes can perform rather poorly for other classes. As
                             an attempt to overcome these difficulties, a "divide and conquer" principle through
                             the use of  multistage classification has been proposed, the so-called decision tree
                             classifiers,  also  known  as hierarchical  classiJers, where  an  unknown  pattern is
                             classified into a class using decision functions in successive stages.
                               At each stage of the tree classifier a simpler problem with a smaller number of
                             features is solved. This has an additional benefit: in practical multi-class problems
                             it  is  rather  difficult  to  guarantee  normal  or  even  symmetric  distributions  with
                             similar covariance matrices for all the classes, but it may be possible that by using
                             a multistage approach these conditions are approximately met, affording  optimal
                             classifiers at each stage.
                               Figure  4.38  shows  an  example  of  a  simple decision  tree  with  two  levels of
                             classification, 1,  and  12.  At the first level, 11, a set of  four classes Q={wl, Q, a,
                             w4] is available for classification. However, instead of allowing the assignment of
                             pattern  x  to  any  of  the  four  classes,  we  limit  the  assignment  to  a  dichotomic
                             partition of Q, denoted as Q(I,)={&, Q(12)}. At level 12,  the classifier then has to
                             decide which class of 51(12) the unknown pattern x belongs to, with Q(I2)={w1, Q,
                             ~41.
                               Let us study the tree classification performance following the work of Kulkarni
                             (1978), assuming for simplicity that the features used  along a path of  the tree, in
                             order to reach a certain node, have independent distributions. The pdf  relative to a
                             class ~k  can then be expressed as:






                             where xj are the subvectors corresponding to the I  independent feature subsets used
                             to classify wk, along a certain tree path.
                               Denoting  the  probability of  a correct classification of  wk  by  Pc(wk), with  the
                             assumption that the feature subsets used in the classification are independent, one
                             has, in the same way:
   138   139   140   141   142   143   144   145   146   147   148