Page 165 - Machine Learning for Subsurface Characterization
P. 165
Robust geomechanical characterization Chapter 5 139
FIG. 5.2 Bar plot of the estimated coefficients β for LASSO model.
2.4.5 Multivariate adaptive regression splines (MARS) model
Few advantages of linear models are their ease and speed of computation and
also the intuitive nature of interpreting their coefficients/parameters.
However, the strong assumption about linearity affects the predictive
accuracy of linear models. MARS models the nonlinear relationship
between features and targets by splitting the feature space into subspaces
and then learns the linear relationship between features and targets for each
of the subspaces. MARS uses a divide and conquer strategy in which the
training datasets are partitioned into separate piecewise linear segments
(splines) of differing gradients (slope). These piecewise linear segments (or
curves), also known as basis functions B q (x), result in a flexible model that
can handle both linear and nonlinear behavior. The points of connection C q
between the piecewise segments are called knots. By relating the features
and targets using multiple independent linear regressions, the model can
capture the nonlinear trends in the dataset. MARS assesses each data point
for each feature as a knot to partition the original feature space into two
new subspaces. Then, two different linear models with the candidate
feature(s) are identified for each subspace that results in the smallest error.
This partitioning is continued until many knots are found, producing a
highly nonlinear pattern, which is a collection of linear models for
individual subspaces. Increase in number of knots allows better fit with the
training dataset; however, the learnt relationship may not generalize well to
new, unseen dataset. Knots that do not contribute significantly to predictive
accuracy can be removed using the process known as “pruning.” MARS