Page 149 -
P. 149

136    4 Statistical Classification

                                     Details concerning tree-table conversion, choice of rule sets and equivalence of
                                   subtables can be found in (Bell, 1978).


                                   4.6.2  Automatic Generation of Tree Classifiers

                                   The decision tree used for the Breast Tissue dataset is an example of a binary tree:
                                   at each node a dichotomic decision is made. Binary trees are the most popular type
                                   of  trees,  namely when  a single  feature  is  used  at  each  node,  resulting  in  linear
                                   discriminants that are parallel to the feature axes, and easily interpreted by human
                                   experts. They also allow categorical features to be easily incorporated, with node
                                   splits based  on a yeslno answer to  the question of  whether or not a given pattern
                                   belongs to a set of categories. For instance, this type of trees is frequently used in
                                   medical applications, often built as a result of statistical studies of the influence of
                                   individual health factors in a given population.
                                     The design of decision trees can be automated in many ways, depending on the
                                   split  criterion  used  at  each  node,  and  the  type  of  search  used  for  best  group
                                   discrimination. A split criterion has the form:



                                   where d(x) is  a  decision  function  of  the  feature  vector  x  and  A  is  a  threshold.
                                   Usually, linear decision functions are used. In many applications, the split criteria
                                   are expressed  in  terms  of  the  individual features  alone  (the  so-called  univariate
                                                                                v3
                                   splits).
                                     A key concept regarding split criteria is the concept of node impurity. The node
                                   impurity is a function of the fraction of patterns belonging to a specific class at that
                                   node.
















                                                                                 0            0
                                     Figure 4.44.  Splitting a node with  maximum impurity. The left split (xl 2A)
                                   decreases the  impurity,  which  is  still non-zero; the  right  split  (w,x, + w2x2 2A)
                                   achieves pure nodes.
   144   145   146   147   148   149   150   151   152   153   154