Page 161 - Machine Learning for Subsurface Characterization
P. 161

136    Machine learning for subsurface characterization


            between the targets and features by constructing and relating their latent
            structures. PLS model learns the multidimensional direction in the feature
            space that explains the maximum multidimensional variance direction in the
            target space. Latent structure corresponding to the most variation in target is
            first extracted and then explained using a latent structure in the feature space.
            New latent structures are combined with original variables to form
            components. The number of components m (no more than the number of
            features) is chosen to maximally summarize the covariance with targets. The
            requirement to build PLS model with the number of components m fewer than
            the number of features makes PLS model more suitable than OLS model
            when there are high correlations among the features. The removal of
            redundant features that are colinear or not strongly correlated to the variance
            in the targets generates more generalizable and accurate model.
               In applying the PLS model, the most important hyper parameter is the
            number of components to be generated. In our study, smallest m with
            the best prediction performance is used to build the model. We select the
            number of components by testing a range of values and monitoring the
            change in the model performance. The best performance of the model occurs
            when the number of components equals 13, and model performance does not
            change significantly when the number of components is reduced from 13 to
            8, which indicates that there are 5 correlated features in the training dataset.
            A shallow-learning model that can work with fewer number of features
            avoids the curse of dimensionality, which makes the model development
            more robust to noise, computationally efficient and improves accuracy of
            predictions. Unlike OLS, PLS can learn to generate multiple targets in a
            single model.

            2.4.3 Least absolute shrinkage and selection operator
            (LASSO) model
            LASSO learns the linear relationship between the features and targets, such
            that the correlated features are not included during the model development to
            prevent overfitting and ensure generalization of the data-driven model.
            LASSO model implements an L1 regularization term that severely penalizes
            nonessential  or  correlated  features  by  forcing  their  corresponding
            coefficients to zero. Unlike the OLS, LASSO model learns the linear
            relationship in the data by minimizing the SSE and the regularization term
            together to ensure the sparsity of the coefficients. The objective function that
            is minimized by the LASSO algorithm is expressed as

                                     1         2
                                 min   k Xw Yk + α w  1                 (5.7)
                                                   kk
                                               2
                                  w 2n
            where w is the coefficient vector comprising coefficients β associated with
            features, which are the parameters of the LASSO model; X is the feature
            vector; Y is the target vector; n is the number of depth samples in the
            training dataset; and the hyperparameter α is the penalty parameter that
   156   157   158   159   160   161   162   163   164   165   166