Page 149 -
P. 149
136 4 Statistical Classification
Details concerning tree-table conversion, choice of rule sets and equivalence of
subtables can be found in (Bell, 1978).
4.6.2 Automatic Generation of Tree Classifiers
The decision tree used for the Breast Tissue dataset is an example of a binary tree:
at each node a dichotomic decision is made. Binary trees are the most popular type
of trees, namely when a single feature is used at each node, resulting in linear
discriminants that are parallel to the feature axes, and easily interpreted by human
experts. They also allow categorical features to be easily incorporated, with node
splits based on a yeslno answer to the question of whether or not a given pattern
belongs to a set of categories. For instance, this type of trees is frequently used in
medical applications, often built as a result of statistical studies of the influence of
individual health factors in a given population.
The design of decision trees can be automated in many ways, depending on the
split criterion used at each node, and the type of search used for best group
discrimination. A split criterion has the form:
where d(x) is a decision function of the feature vector x and A is a threshold.
Usually, linear decision functions are used. In many applications, the split criteria
are expressed in terms of the individual features alone (the so-called univariate
v3
splits).
A key concept regarding split criteria is the concept of node impurity. The node
impurity is a function of the fraction of patterns belonging to a specific class at that
node.
0 0
Figure 4.44. Splitting a node with maximum impurity. The left split (xl 2A)
decreases the impurity, which is still non-zero; the right split (w,x, + w2x2 2A)
achieves pure nodes.