Page 365 - Computational Statistics Handbook with MATLAB
P. 365
354 Computational Statistics Handbook with MATLAB
α
There is a continuum of values for the complexity parameter , but if a tree
T α() is a tree that minimizes R α T() for a given α , then it will continue to
α
minimize it until a jump point for is reached. Thus, we will be looking for
α
a sequence of complexity values and the trees that minimize the cost com-
, we start pruning
plexity measure for each level. Once we have our tree T 1
off the branches that have the weakest link. To find the weakest link, we first
define a function on a tree as follows
(
Rt() – RT )
kt
g t() = --------------------------------- t is an internal node, (9.24)
k
)
T kt – 1
.
where T kt is the branch T t corresponding to the internal node t of subtree T k
, we determine the
From Equation 9.24, for every internal node in tree T k
value for g k t() . We define the weakest link t k ∗ in tree T k as the internal node
t that minimizes Equation 9.24,
∗
(
g k t k ) = min t g k t(){ . } (9.25)
Once we have the weakest link, we prune the branch defined by that node.
The new tree in the sequence is obtained by
1 = T k – , (9.26)
T k + T ∗
t
k
where the subtraction in Equation 9.26 indicates the pruning process. We set
the value of the complexity parameter to
1 = ( ∗ . ) (9.27)
α k + g k t k
The result of this pruning process will be a decreasing sequence of trees,
T max > T 1 > T 2 > … > T K = {} ,
t 1
along with an increasing sequence of values for the complexity parameter
0 = α 1 < … < α k < α k + < … < α K .
1
We need the following key fact when we describe the procedure for choosing
the best tree from the sequence of subtrees:
© 2002 by Chapman & Hall/CRC

