Page 356 - Computational Statistics Handbook with MATLAB
P. 356

Chapter 9: Statistical Pattern Recognition                      345


                             Example 9.9
                             We show a simple classification tree in Figure 9.10, where we are concerned
                             with only two features. Note that all internal nodes have two children and a
                             splitting rule. The split can occur on either variable, with observations that
                             are less than that value being assigned to the left child and the rest going to
                             the right child. Thus, at node 1, any observation where the first feature is less
                             than 5 would go to the left child. When an observation stops at one of the ter-
                             minal nodes, it is assigned to the corresponding class for that node. We illus-
                             trate these concepts with several cases. Say that we have a feature vector
                                           ,
                             given by x =  ( 46),  then passing this down the tree, we get

                                                    node 1 → node 2 ⇒  ω 1  .

                                                       ,
                             If our feature vector is x =  ( 66),   then we travel the tree as follows:
                                            node 1 → node 3 → node 4 →  node 6 ⇒  ω 2  .

                                                              ,
                             For a feature vector given by x =  ( 10 12),   we have
                                                node 1 →  node 3 →  node 5 ⇒  ω 2  .




                              We give a brief overview of the steps needed to create a tree classifier and
                             then explain each one in detail. To start the process, we must grow an overly
                             large tree using a criterion that will give us optimal splits for the tree. It turns
                             out that these large trees fit the training data set very well. However, they do
                             not generalize, so the rate at which we correctly classify new patterns is low.
                             The proposed solution [Breiman, et al., 1984] to this problem is to continually
                             prune the large tree using a minimal cost complexity criterion to get a
                             sequence of sub-trees. The final step is to choose a tree that is the ‘right size’
                             using cross-validation or an independent test sample. These three main pro-
                             cedures are described in the remainder of this section. However, to make
                             things easier for the reader, we first provide the notation that will be used to
                             describe classification trees.


                             CLASSIFICATION TREES - NOTATION
                                L   denotes a learning set made up of observed feature vectors and
                                   their class label.
                                J denotes the number of classes.

                                T is a classification tree.
                                t represents a node in the tree.

                            © 2002 by Chapman & Hall/CRC
   351   352   353   354   355   356   357   358   359   360   361