Page 363 - Computational Statistics Handbook with MATLAB

P. 363

352 Computational Statistics Handbook with MATLAB

e
reee
uning
uning
Pr PPrr uningthethe Tr TTrr eeee
PruningthetheT
Recall that the classification error for a node is given by Equation 9.15. If we
grow a tree until each terminal node contains observations from only one
class, then the error rate will be zero. Therefore, if we use the classification
error as a stopping criterion or as a measure of when we have a good tree,
then we would grow the tree until there are pure nodes. However, as we men-
tioned before, this procedure over fits the data and the classification tree will
not generalize well to new patterns. The suggestion made in Breiman, et al.
, and then to find a
[1984] is to grow an overly large tree, denoted by T max
nested sequence of subtrees by successively pruning branches of the tree. The
best tree from this sequence is chosen based on the misclassification rate esti-
mated by cross-validation or an independent test sample. We describe the
two approaches after we discuss how to prune the tree.
The pruning procedure uses the misclassification rates along with a cost for
the complexity of the tree. The complexity of the tree is based on the number
of terminal nodes in a subtree or branch. The cost complexity measure is
defined as

)
R α T() = RT() + α T ; α ≥ . 0 (9.21)

We look for a tree that minimizes the cost complexity given by Equation 9.21.
α
The is a parameter that represents the complexity cost per terminal node.
If we have a large tree where every terminal node contains observations from
only one class, then RT() will be zero. However, there will be a penalty paid
because of the complexity, and the cost complexity measure becomes

)
R α T() = α T .
If α is small, then the penalty for having a complex tree is small, and the
resulting tree is large. The tree that minimizes R α T() will tend to have few
α
nodes and large .
Before we go further with our explanation of the pruning procedure, we
of a tree
need to define what we mean by the branches of a tree. A branch T t
T consists of the node and all its descendent nodes. When we prune or
t
delete this branch, then we remove all descendent nodes of , leaving the
t
t
branch root node . For example, using the tree in Figure 9.10, the branch cor-
responding to node 3 contains nodes 3, 4, 5, 6, and 7, as shown in Figure 9.13.
If we delete that branch, then the remaining nodes are 1, 2, and 3.
Minimal complexity pruning searches for the branches that have the weak-
est link, which we then delete from the tree. The pruning process produces a
sequence of subtrees with fewer terminal nodes and decreasing complexity.
. We are
We start with our overly large tree and denote this tree as T max
searching for a finite sequence of subtrees such that

358 359 360 361 362 363 364 365 366 367 368