Page 366 - Computational Statistics Handbook with MATLAB
P. 366

Chapter 9: Statistical Pattern Recognition                      355

                                    For k ≥  1  , the tree T k   is the minimal cost complexity tree for the
                                    interval α k ≤  α <  α k +  1  , and

                                                              (
                                                     T α() =  T α ) =  T  .
                                                                k     k

                             PROCEDURE - PRUNING THE TREE

                                                           .
                                1. Start with a large tree  T max
                                                                       by searching through all
                                2. Find  the first tree in the sequence  T 1
                                   terminal node  pairs. F or  each of these  pairs, if
                                                  ()
                                           ()
                                   Rt() =  Rt L +  Rt R  , then delete nodes  t L   and t R  .
                                3. For all internal nodes in the current tree, calculate  g k t()   as given
                                   in Equation 9.24.
                                4. The weakest link is the node that has the smallest value for  g k t() .
                                5. Prune off the branch that has the weakest link.
                                6. Repeat steps 3 through 5 until only the root node is left.


                             Example 9.12
                             We continue with the same data set from the previous examples. We apply
                             the pruning procedure to the large tree obtained in Example 9.11. The prun-
                             ing function for classification trees is called csprunec. The input argument
                             is a tree, and the output argument is a cell array of subtrees, where the first
                                                     and the last tree corresponds to the root node.
                             tree corresponds to tree T 1
                                treeseq = csprunec(tree);
                                K = length(treeseq);
                                alpha = zeros(1,K);
                                % Find the sequence of alphas.
                                % Note that the root node corresponds to K,
                                % the last one in the sequence.
                                for i = 1:K
                                    alpha(i) = treeseq{i}.alpha;
                                end
                                                     α
                             The resulting sequence for   is
                                alpha = 0, 0.01, 0.03, 0.07, 0.08, 0.10.

                             We see that as k increases (or, equivalently, the complexity of the tree
                             decreases), the complexity parameter increases. We plot two of the subtrees
                                                                  with α =  0.08   has fewer terminal
                             in Figures 9.14 and 9.15. Note that tree T 5
                                              with α =  0.03  .
                             nodes than tree T 3



                            © 2002 by Chapman & Hall/CRC
   361   362   363   364   365   366   367   368   369   370   371