Page 144 -
P. 144
4.6 Tree Classifiers 13 1
Figure 4.38. Example of a decision tree classifier for four classes and two levels of
decision.
where Pc(uk I li) are the probabilities of a correct decision at each traversed node li
in the path leading to a, denoted T(uk). Averaging these Pc(w~) with the
prevalences, one obtains the correct recognition rate of the tree:
At the same time, the correct recognition rate at a node l,, averaged across all the
classes that can be reached from it, is written as:
This last formula shows that the node performance is a linear function of the
class recognition rates, whereas the total performance expressed by (4-51) is a
nonlinear function. Therefore, contrary to common sense, optimizing the tree
performance at each level does not guarantee that the overall tree performance is
also optimal. The determination of the optimal structure of the tree is an additional
complication. We see, therefore, that designing an optimal tree is not an easy task,
and would require, in general, a prohibitively exhaustive search in the space of all
structures and feature combinations. Search techniques such as dynamic
programming and branch-and-bound methods have been used for this purpose.
Descriptions of these methods can be found in the bibliography.
Keeping in mind that optimizing node performance is no guarantee of optimal
tree performance it must be said, that in practical applications one usually follows
this "manual" method of tree design, selecting the best solutions of tree structure