Page 366 - Computational Statistics Handbook with MATLAB
P. 366
Chapter 9: Statistical Pattern Recognition 355
For k ≥ 1 , the tree T k is the minimal cost complexity tree for the
interval α k ≤ α < α k + 1 , and
(
T α() = T α ) = T .
k k
PROCEDURE - PRUNING THE TREE
.
1. Start with a large tree T max
by searching through all
2. Find the first tree in the sequence T 1
terminal node pairs. F or each of these pairs, if
()
()
Rt() = Rt L + Rt R , then delete nodes t L and t R .
3. For all internal nodes in the current tree, calculate g k t() as given
in Equation 9.24.
4. The weakest link is the node that has the smallest value for g k t() .
5. Prune off the branch that has the weakest link.
6. Repeat steps 3 through 5 until only the root node is left.
Example 9.12
We continue with the same data set from the previous examples. We apply
the pruning procedure to the large tree obtained in Example 9.11. The prun-
ing function for classification trees is called csprunec. The input argument
is a tree, and the output argument is a cell array of subtrees, where the first
and the last tree corresponds to the root node.
tree corresponds to tree T 1
treeseq = csprunec(tree);
K = length(treeseq);
alpha = zeros(1,K);
% Find the sequence of alphas.
% Note that the root node corresponds to K,
% the last one in the sequence.
for i = 1:K
alpha(i) = treeseq{i}.alpha;
end
α
The resulting sequence for is
alpha = 0, 0.01, 0.03, 0.07, 0.08, 0.10.
We see that as k increases (or, equivalently, the complexity of the tree
decreases), the complexity parameter increases. We plot two of the subtrees
with α = 0.08 has fewer terminal
in Figures 9.14 and 9.15. Note that tree T 5
with α = 0.03 .
nodes than tree T 3
© 2002 by Chapman & Hall/CRC

