Page 121 - Machine Learning for Subsurface Characterization
P. 121

Shallow neural networks and classification methods Chapter  3 97



                                                                   2
               TABLE 3.B1 Comparison of predictive performance, in terms of R and
               NRMSE, of the first ANN model that processes 12 conventional logs and 10
               inversion-derived logs with and without the 5 categorical flags. The use of
               features/flags constructed using the KNN classifier drastically improves the
               prediction performance of the ANN model.

                                      Training                Testing
                              R 2         NRMSE       R 2         NRMSE
               With flags     0.8574      0.1201      0.8549      0.1218
               Without flags  0.7100      0.1760      0.7152      0.1690



             Appendix C Importance of the 12 conventional logs and
             10 inversion-derived logs
             Machine-learning algorithms process large datasets to develop a data-driven
             model. The outcomes of a data-driven model for a given set of features/attri-
             butes are primarily governed by the importance of the features. Feature impor-
             tance indicates the significance of a feature for developing robust data-driven
             model. Importance of a feature for a machine-learning task depends on the sta-
             tistical properties of the feature and on the relationship of the feature with other
             features, targets, and the machine-learning algorithm used to develop the data-
             driven model. Feature ranking helps us understand the inherent decision making
             process of a data-driven model and helps in evaluating the consistency of a data-
             driven model by making the model easy to interpret.
                There are several techniques for ranking the features. The most popular
             techniques are the permutation importance method and feature perturbation
             ranking method. In this study, we are using an alternative feature importance
             method, wherein the importance of each feature is quantified as the difference
             between the prediction performance (generalization performance) of the model
             trained on all the features and that trained on all but the one feature to be ranked.
             The contribution of one feature is removed by replacing the measurements of
             that feature for all samples with zeros. A large drop in the prediction perfor-
             mance when one feature is removed (i.e., replaced with zeros) indicates that
             the feature has a lot of significance when developing (training) and deploying
             (testing) the data-driven model.
                Fig. 3.C1 shows the feature importance for 27 features, which include 10
             conventional logs, 12 inversion-derived logs, and 5 categorical flags. Flag 1
             (lithology) is the most important feature for the desired synthesis task. Flags
             2, 3, and 4 also have high feature importance. This explains the drastic improve-
             ment in the ANN-based prediction when using the categorical flags. Among the
             conventional logs, DTCO, DTSM, GR, NPOR, and DPHZ exhibit high
   116   117   118   119   120   121   122   123   124   125   126