Page 150 -
P. 150
4.6 Tree Classifiers 137
Consider the two-class situation shown in Figure 4.44. Initially, we have a node
with equal proportions of patterns belonging to the two classes. We say that its
impurity is maximal. The right split results in nodes with zero impurity, since they
contain patterns from only one of the classes. The left split, on the contrary,
increases the proportion of cases from one of the classes, therefore decreasing the
impurity, although some impurity is still present.
A convenient way to measure the impurity of a node t in the interval [0, 11 is by
using the following function:
C
i(t)= -x ~(k [ t)log ~(k I t),
k=l
where P(k I t) is the proportion of patterns of class k at node t. This is the well-
known definition of entropy, which measures the "disorder" of the patterns at each
node. For the situation shown in Figure 4.44, we have:
A popular tree-generation algorithm called ID3 generates a binary tree using the
entropy as split criterion. The algorithm starts at the root node, which corresponds
to the whole training set, and progresses along the tree by searching for the feature
and threshold level combination that achieve the maximum decrease of the
impuritytentropy at each node. The generation of splits stops when no significant
decrease of the impurity is achieved. It is common practice to use the individual
feature values of the training set patterns as candidate threshold values. Often, after
generating a tree in an automatic way, some sort of tree pruning must be
performed in order to remove branches of no interest.
The method of exhaustive search for the best univariate splits is usually called
the CART method, pioneered by Breiman, Friedman, Olshen, and Stone (see
Breiman et al., 1993). Instead of the entropy measure one can use other ways to
measure node impurity, for instance methods based on the ANOVA statistic.
Using the CART approach with the ANOVA statistic, available in Statisticu, the
following univariate splits were found for the Breast Tissue dataset:
- First level node: (adi, con} vs. others. Feature I0 was used, with the threshold
10=600.62. This split is achieved with zero misclassified cases.
- Second level, left node: adz' vs. con. Feature PERIM was used, with the
threshold PERIM=1563.8. This split is achieved with one misclassified case
(2.8%).