Page 161 - Machine Learning for Subsurface Characterization

P. 161

136 Machine learning for subsurface characterization

between the targets and features by constructing and relating their latent
structures. PLS model learns the multidimensional direction in the feature
space that explains the maximum multidimensional variance direction in the
target space. Latent structure corresponding to the most variation in target is
first extracted and then explained using a latent structure in the feature space.
New latent structures are combined with original variables to form
components. The number of components m (no more than the number of
features) is chosen to maximally summarize the covariance with targets. The
requirement to build PLS model with the number of components m fewer than
the number of features makes PLS model more suitable than OLS model
when there are high correlations among the features. The removal of
redundant features that are colinear or not strongly correlated to the variance
in the targets generates more generalizable and accurate model.
In applying the PLS model, the most important hyper parameter is the
number of components to be generated. In our study, smallest m with
the best prediction performance is used to build the model. We select the
number of components by testing a range of values and monitoring the
change in the model performance. The best performance of the model occurs
when the number of components equals 13, and model performance does not
change significantly when the number of components is reduced from 13 to
8, which indicates that there are 5 correlated features in the training dataset.
A shallow-learning model that can work with fewer number of features
avoids the curse of dimensionality, which makes the model development
more robust to noise, computationally efficient and improves accuracy of
predictions. Unlike OLS, PLS can learn to generate multiple targets in a
single model.

2.4.3 Least absolute shrinkage and selection operator
(LASSO) model
LASSO learns the linear relationship between the features and targets, such
that the correlated features are not included during the model development to
prevent overfitting and ensure generalization of the data-driven model.
LASSO model implements an L1 regularization term that severely penalizes
nonessential or correlated features by forcing their corresponding
coefficients to zero. Unlike the OLS, LASSO model learns the linear
relationship in the data by minimizing the SSE and the regularization term
together to ensure the sparsity of the coefficients. The objective function that
is minimized by the LASSO algorithm is expressed as

1 2
min k Xw Yk + α w 1 (5.7)
kk
2
w 2n
where w is the coefficient vector comprising coefficients β associated with
features, which are the parameters of the LASSO model; X is the feature
vector; Y is the target vector; n is the number of depth samples in the
training dataset; and the hyperparameter α is the penalty parameter that

156 157 158 159 160 161 162 163 164 165 166