Page 365 - Computational Statistics Handbook with MATLAB
P. 365

354                        Computational Statistics Handbook with MATLAB

                                                                                    α
                              There is a continuum of values for the complexity parameter  , but if a tree
                             T α()  is a tree that minimizes  R α T()  for a given  α  , then it will continue to
                                                           α
                             minimize it until a jump point for   is reached. Thus, we will be looking for
                                                          α
                             a sequence of complexity values   and the trees that minimize the cost com-
                                                                                , we start pruning
                             plexity measure for each level. Once we have our tree  T 1
                             off the branches that have the weakest link. To find the weakest link, we first
                             define a function on a tree as follows
                                                         (
                                                 Rt() –  RT )
                                                           kt
                                          g t() =  ---------------------------------  t is an internal node,  (9.24)
                                           k
                                                     )
                                                    T kt –  1
                                                                                               .
                             where T kt   is the branch T t   corresponding to the internal node t of subtree T k
                                                                               , we determine the
                             From Equation 9.24, for every internal node in tree  T k
                             value for g k t()  . We define the weakest link t k ∗   in tree T k   as the internal node
                             t that minimizes Equation 9.24,
                                                        ∗
                                                      (
                                                    g k t k ) =  min t g k t(){  . }       (9.25)
                              Once we have the weakest link, we prune the branch defined by that node.
                             The new tree in the sequence is obtained by

                                                          1 =  T k –  ,                    (9.26)
                                                       T k +      T ∗
                                                                   t
                                                                    k
                             where the subtraction in Equation 9.26 indicates the pruning process. We set
                             the value of the complexity parameter to

                                                           1 =  (  ∗  . )                  (9.27)
                                                        α k +  g k t k
                             The result of this pruning process will be a decreasing sequence of trees,

                                                 T max > T 1 > T 2 > … >  T K =  {}  ,
                                                                         t 1

                             along with an increasing sequence of values for the complexity parameter

                                                0 =  α 1 < … <  α k < α k + < … <  α K  .
                                                                   1
                             We need the following key fact when we describe the procedure for choosing
                             the best tree from the sequence of subtrees:







                            © 2002 by Chapman & Hall/CRC
   360   361   362   363   364   365   366   367   368   369   370