Page 358 - Computational Statistics Handbook with MATLAB
P. 358

Chapter 9: Statistical Pattern Recognition                      347


                                   is found by subtracting the  maximum  conditional probability
                                   p ω j t(  )   for the node from 1:

                                                                {
                                                                   (
                                                   rt() =  1 –  max p ω t)}  .             (9.15)
                                                                     j
                                                              j
                                Rt()   is the resubstitution estimate of risk for node t. This is

                                                       Rt() =  rt()pt()  .                 (9.16)

                                RT()   denotes a resubstitution estimate of the overall misclassification
                                   rate for a tree T. This can be calculated using every terminal node
                                   in the tree as follows


                                                RT() =  ∑  rt()pt() =  ∑  Rt()  .          (9.17)
                                                       t ∈  T )      t ∈  T )
                                α   is the complexity parameter.

                                it()   denotes a measure of impurity at node t.
                                    ,
                                  (
                                ∆is t)   represents the decrease in impurity and indicates the good-
                                   ness of the split s at node t. This is given by
                                                  (
                                                    ,
                                                                         ()
                                                                 ()
                                                ∆is t) =  it() –  p R it R –  p L it L  .  (9.18)
                                          are the proportion of data that are sent to the left and right
                                p L   and p R
                                   child nodes by the split s.


                                         Tree
                             GrowingtheT
                             Growing the  ree
                             Growingthethe
                                         TTreeree
                             Growing
                             The idea behind binary classification trees is to split the d-dimensional space
                             into smaller and smaller partitions, such that the partitions become purer in
                             terms of the class membership. In other words, we are seeking partitions
                             where the majority of the members belong to one class. To illustrate these
                             ideas, we use a simple example where we have patterns from two classes,
                                                                    . How we obtain these data are
                             each one containing two features, x 1   and  x 2
                             discussed in the following example.
                             Example 9.10
                             We use synthetic data to illustrate the concepts of classification trees. There
                             are two classes, and we generate 50 points from each class. From Figure 9.11,
                             we see that each class is a two term mixture of bivariate uniform random
                             variables.

                            © 2002 by Chapman & Hall/CRC
   353   354   355   356   357   358   359   360   361   362   363