Page 101 - Machine Learning for Subsurface Characterization
P. 101

84  Machine learning for subsurface characterization


            3.3 Testing the first ANN model
            There are only 62 testing depths, which are comparatively fewer than the 354
            depths for training the ANN model. The prediction performance on the testing
            dataset (also referred as the generalization performance) is similar to that on the
            training dataset. Fig. 3.6 presents the prediction performance (in terms of
            NRMSE) of the first ANN model on the testing dataset (Fig. 3.6). The median
              2
            R and median NRMSE of predictions on the testing dataset are 0.8549 and
            0.1218, respectively. This testing performance is remarkable given the hostile
            subsurface borehole conditions when acquiring the logs, which result in low
            signal-to-noise ratio, and the limited size of the dataset available to build the
            model, which gives rise to overfitting and poor generalization. Fig. 3.B2 shows
            the histograms of NRMSE for training and testing datasets without the imple-
            mentation of the five categorical Flags (1–5) as additional features. Comparison
            of Fig. 3.B2 with Fig. 3.6 highlights the necessity of Flags as categorical fea-
            tures to achieve good generalization performance.
               Notably, Fig. 3.C1 lists all the features in terms of their importance to the
            data-driven task of T 2 synthesis. Feature importance was performed to find
            the most important features out of the 27 features, which include 10 conven-
            tional logs, 12 inversion-derived logs, and 5 categorical features. Importance
            of a feature for a machine-learning task depends on the statistical properties
            of the feature and on the relationship of the feature with other features, targets,
            and the machine-learning algorithm used to develop the data-driven model.
            Feature importance indicates the significance of a feature for developing a
            robust data-driven model. Feature importance helps us understand the inherent
            decision making process of a data-driven model and helps in evaluating the con-
            sistency of a data-driven model by making the model easy to interpret.


            3.4 Training the second ANN model
            The second ANN model involves a two-step training process: (1) parameteriz-
            ing the T 2 distribution by fitting a bimodal Gaussian distribution and (2) training
            the ANN model to predict the six parameters governing the bimodal Gaussian
            distribution fit to the T 2 distribution. By following the two-step training process,
            a trained ANN model can generate the six parameters of the bimodal Gaussian
            distribution. Prediction performance of the second model is affected by the
            errors in fitting the T 2 distribution with a bimodal Gaussian distribution (listed
            in Table 3.1). Fig. 3.8 presents the prediction performance of the second ANN
            model for 25 randomly selected depths from the training dataset.
                          2
               The median R and median NRMSE of predictions of the second ANN model
            on the training dataset are 0.7634 and 0.1571, respectively, as compared with
            0.8574 and 0.1201, respectively, for the first ANN model. Consequently, the pre-
            diction performance on the training dataset (also referred as the memorization
            performance) of the first ANN model is superior to that of the second model,
            but the computational time of the first ANN model is 30% more than that of
            the second model. Histograms of NRMSE of predictions for the 354 depths
   96   97   98   99   100   101   102   103   104   105   106