Page 150 -
P. 150

4.6 Tree Classifiers   137

                                  Consider the two-class situation shown in Figure 4.44. Initially, we have a node
                                with  equal proportions of  patterns belonging to  the two classes. We say  that  its
                                impurity is maximal. The right split results in nodes with zero impurity, since they
                                contain  patterns  from  only  one  of  the  classes.  The  left  split,  on  the  contrary,
                                increases the proportion of cases from one of the classes, therefore decreasing the
                                impurity, although some impurity is still present.
                                  A convenient way to measure the impurity of a node t in the interval [0, 11 is by
                                using the following function:

                                         C
                                  i(t)= -x ~(k [ t)log  ~(k I t),
                                        k=l
                                where P(k I  t) is  the proportion  of  patterns of  class k at  node t. This is the well-
                                known definition of entropy, which measures the "disorder" of the patterns at each
                                node. For the situation shown in Figure 4.44, we have:











                                  A popular tree-generation algorithm called ID3 generates a binary tree using the
                                entropy as split criterion. The algorithm starts at the root node, which corresponds
                                to the whole training set, and progresses along the tree by searching for the feature
                                and  threshold  level  combination  that  achieve  the  maximum  decrease  of  the
                                 impuritytentropy at each node. The generation of splits stops when no significant
                                 decrease of  the impurity is achieved. It is common practice to use the individual
                                 feature values of the training set patterns as candidate threshold values. Often, after
                                 generating  a  tree  in  an  automatic  way,  some  sort  of  tree pruning  must  be
                                 performed in order to remove branches of no interest.
                                   The method of exhaustive search for the best univariate splits is usually called
                                 the  CART  method,  pioneered  by  Breiman,  Friedman,  Olshen,  and  Stone  (see
                                 Breiman  et al., 1993). Instead of  the entropy measure one can use other ways to
                                 measure node impurity, for instance methods based on the ANOVA statistic.
                                   Using the CART approach with the ANOVA statistic, available in Statisticu, the
                                 following univariate splits were found for the Breast Tissue dataset:

                                 - First level node:  (adi, con} vs. others. Feature I0 was used, with  the threshold
                                   10=600.62. This split is achieved with zero misclassified cases.
                                 - Second  level,  left  node:  adz'  vs.  con.  Feature  PERIM  was  used,  with  the
                                   threshold  PERIM=1563.8. This  split  is  achieved with  one misclassified case
                                   (2.8%).
   145   146   147   148   149   150   151   152   153   154   155