Page 159 - Machine Learning for Subsurface Characterization
P. 159

134    Machine learning for subsurface characterization


            where
                                         n
                                                     2
                                        X

                                  RSS j ¼   y pi, j  y mi, j            (5.3)
                                        i¼1
            and
                                          n
                                        X            2
                                   TSS j ¼   y pi, j  y j               (5.4)
                                         i¼1
            where n is the total number of depths for which DTC and DTS logs need to be
            synthesized; j ¼ 1 indicates the DTC log and j ¼ 2 indicates the DTS log;
            i represents a specific depth; and y pi,j is the sonic log j predicted at depth i,
            y mi,j is the sonic log j measured at depth i, and y is the mean of sonic log j
                                                     j
            measured at all depths in the training or testing dataset. RSS j is the sum of
            squares of the residuals, and TSS j is the total sum of squares proportional to
            the variance of the corresponding sonic log j.


            2.4 Shallow-learning regression models
            Six shallow-learning models synthesize DTS and DTC logs by processing 13
            “easy-to-acquire” logs. The trained shallow-learning model captures the
            hidden relationships between the 13 “easy-to-acquire” logs and the 2 DTC
            and DTS sonic logs. Machine learning workflow involves the following
            steps in chronological order:

            1. Identify targets and features in the dataset.
            2. Split the dataset into training and testing datasets. Perform data
               preprocessing of training dataset; following that, perform data
               preprocessing of testing dataset; finally, perform data preprocessing of
               the new dataset on which the models need to be deployed.
            3. Select a set of hyperparameters for the machine learning model.
            4. Train the model on the training dataset. Continuously monitor the
               performance metric to evaluate the memorization error of the model.
            5. Stop training the model after the model performance crosses a certain
               threshold.
            6. After the training, test the model on the testing dataset and compute the
               generalization error of the model.
            7. Compare the model performance on training dataset (memorization error)
               against that on testing dataset (generalization error).
            8. Repeat the steps 4, 5, 6, and 7 with another set of hyperparameter till
               the memorization error (training error), generalization error (testing
               error), and the difference between these two errors are below certain
               thresholds.
            In our study the DTS and DTC logs are the targets, and the 13 other “easy-to-
            acquire” logs are the features. The shallow-learning models will learn to
   154   155   156   157   158   159   160   161   162   163   164