Page 303 - Machine Learning for Subsurface Characterization

P. 303

Classification of sonic wave Chapter 9 265

FIG. 9.16 Implementation of DT classifier on a dataset that has two features (X 1 and X 2 ) and three
classes. Two of the three leaves in the tree are pure leaves. At each node, DT classifier finds the
feature and corresponding feature threshold to perform a split, such that there is an increase in purity
of the dataset after the split.

nodes. Each node is split into internal nodes and/or leaves such that there is an
increase in purity of the dataset after the split, that is, each split should cause the
dataset to be separated into groups that contain samples predominantly belong-
ing to one class. At each node, the algorithm selects a feature and a threshold
value of the corresponding feature, such that there is a maximum drop in
entropy or impurity when the node is split using the chosen feature and the cor-
responding threshold. The best-case scenario during splitting is to obtain a pure
leaf, which contains samples belonging to only one class. The DT algorithm
does not require feature scaling. DT classifier is sensitive to noise in data
and selection of training dataset due to the high variance of the method. Hyper-
parameter optimization is required to lower the variance at the cost of high bias.
Bias of the DT classifier can be reduced at the cost of increasing variance by
allowing the tree to grow till greater depth (i.e., more splits) or by allowing
the leaves to contain fewer samples, such that more splits are made to obtain
pure leave nodes. Nonetheless, the decision tree model is easy to interpret
because the decision-making process during training and deployment can be
easily understood by following the decision flow in the tree-like decision
structure.

4.1.4 Random forest (RF) classifier
RF classifier is an ensemble method that trains several decision trees in parallel
with bootstrapping followed by aggregation, jointly referred as bagging
(Fig. 9.17). Bootstrapping indicates that several individual decision trees are
trained in parallel on various subsets of the training dataset using different sub-
sets of available features. Bootstrapping ensures that each individual decision

298 299 300 301 302 303 304 305 306 307 308