Page 305 - Machine Learning for Subsurface Characterization

P. 305

Classification of sonic wave Chapter 9 267

FIG. 9.18 Implementation of AdaBoost classifier on a dataset that has two features and two clas-
ses. Weak learner #2 improves on the mistake made by weak learner #1, such that the decision
boundaries learnt by the two weak learners can be combined to form a strong learner. In this case,
each weak learner is a decision tree, and AdaBoost classifier (i.e., strong learner) combines the weak
learner in series.

The weight of a sample misclassified by the previous tree will be boosted so that
the subsequent tree focuses on correctly classifying the previously misclassified
sample. The classification accuracy increases when more weak classifiers are
added in series to the model; however, this may lead to severe overfitting
and drop in generalization capability. AdaBoost is suited for imbalanced data-
sets but underperforms in the presence of noise. AdaBoost is slower to train.
Hyperparameter optimization of AdaBoost is much more difficult than RF clas-
sifier (Fig. 9.18).

4.1.6 Naı ¨ve Bayes (NB) classifier
Naı ¨ve Bayes classifier is a probabilistic classifier based on Bayes’ theorem,
which assumes that each feature makes an independent and equal contribution
to the target class. NB classifier assumes that each feature is independent and
does not interact with each other, such that each feature independently and
equally contributes to the probability of a sample to belong to a specific class.
NB classifier is simple to implement and computationally fast and performs
well on large datasets having high dimensionality. NB classifier is conducive
for real-time applications and is not sensitive to noise. NB classifier processes
the training dataset to calculate the class probabilities P(y i ) and the conditional
probabilities, which define the frequency of each feature value for a given class
value divided by the frequency of instances with that class value. NB classifier
best performs when correlated features are removed because correlated features

300 301 302 303 304 305 306 307 308 309 310