Page 158 - Machine Learning for Subsurface Characterization
P. 158

Robust geomechanical characterization Chapter  5 133


             (NPOR), photoelectric factor log (PEFZ), bulk density log (RHOZ), discrete
             lithology flags, and laterolog resistivity logs at 6 depths of investigation (RLA0,
             RLA1, RLA2, RLA3, RLA4, and RLA5). These logs from Well 1 are shown in
             Tracks 2–5of Fig. 5.1. These easy-to-acquire logs are fed into the six shallow-
             learning models and the five clustering methods. The 4240-ft depth interval in
             Well 1 has 13 distinct lithologies, and the corresponding discrete lithology flag
             ranges from 1 to 13 indicating the 13 lithology. DTC and DTS logs (Fig. 5.1,
             Track 6) are the outputs of the shallow-learning log synthesis models. From a
             machine learning perspective, the 13 “easy-to-acquire” logs will be referred as
             features and the 2 DTC and DTS logs being synthesized will be referred as targets.


             2.2 Data preprocessing
             Data preprocessing aims to facilitate the training/testing process by
             appropriately transforming and scaling the entire dataset. Preprocessing is
             necessary before training the machine learning models. Preprocessing
             removes outliers and scales the features to an equivalent range. We use min-
             max scaling that ensures fast convergence of the gradient-based learning
             process, especially for neural network models. Min-max scaling is performed
             on one feature at a time using the following equation:

                                          y i  y min
                                     0
                                    y i ¼ 2        1                    (5.1)
                                         y max  y min
             wherey i istheoriginalvalueofalogresponse(y)andy i isthescaledvalueofthelog
                                                      0
             response (y) at a depth i. y min and y max are theminimumand maximum valuesof the
             log response (y), respectively. Min-max scaling is performed only on the 13 “easy-
             to-acquire” logs, which are considered as features for the shallow-learning task of
             synthesizingDTSandDTClogs.WedonotscaletheDTSandDTClogs,whichare
             the targets for the machine learning task. As mentioned in previous chapters, a
             machine learning workflow first learns from the training dataset, then is
             evaluated on the testing dataset, and finally deployed on the new dataset. Any
             data preprocessing step should adopt the following sequence of steps: (1)
             perform data preprocessing on the training dataset; (2) learn the statistical
             parameters required for the data preprocessing of the training dataset; and (3)
             perform data preprocessing on the testing dataset and new dataset by applying
             the statistical parameters learnt from the preprocessing of the training dataset. In
             our case, minimum and maximum of each feature (log) is first learnt during the
             scaling of training dataset, and then those minimum and maximum values are
             usedforscalingthecorrespondingfeaturesinthetestingdatasetandthenewdataset.


             2.3 Metric to evaluate the log-synthesis performance
             of the shallow-learning regression models

                                     2
             The correlation coefficient (R ) is used to compare the prediction performance
             of all models, which is formulated as
                                      2
                                    R j ¼ 1 RSS j =TSS j                (5.2)
   153   154   155   156   157   158   159   160   161   162   163