Page 173 - Machine Learning for Subsurface Characterization
P. 173

Robust geomechanical characterization Chapter  5 147


             intractably heavy. Another benefit of the two-step clustering method is noise
             reduction. The prototypes constructed by SOM are local averages of the data
             and, therefore, less sensitive to random variations than the original data. In our
             study the SOM has a dimension of 50 neurons by 50 neurons, upon which the
             8481 samples from Well 1 will be mapped. All “easy-to-acquire” logs were
             fed as inputs to the SOM model. The weights of SOM were randomly
             initialized. During training the weight vectors are updated based on the
             similarity between the weight vectors and input vectors, which results in
             moving the SOM neurons/nodes closer to certain dense regions of the original
             data. The similarity between data points and SOM nodes during the weight
             update is evaluated based on Euclidean distance. The result of the two-step
             SOM followed by K-Means clustering does not have a strong correlation with
             the relative error in log synthesis, as shown in Figs. 5.5D and Fig. 5.8.


             3  Results

             3.1 Prediction performances of shallow-learning regression models
             The six shallow-learning regression models discussed in the earlier section were
             trained and tested on 8481 data points (distinct depths) acquired from the 4240-
             ft depth interval in Well 1 and deployed for blind testing on 2920 data points
             acquired from 1460-ft depth interval in Well 2. Model training was on 80%
             of randomly selected data from Well 1, and model testing was on remaining
             dataset. The data-driven model is trained to capture the hidden relationships
             of the 13 “easy-to-acquire” logs (features) with the DTC and DTS sonic logs
             (targets). The performance of log synthesis is evaluated in terms of the
                                      2
             coefficient of determination, R . The log synthesis results for Wells 1 and 2
             are shown in Table 5.2. The log-synthesis performances for DTC and DTS
             in Wells 1 and 2 in terms of R  2  are illustrated in Fig. 5.6. OLS and PLS
             exhibit similar performances during training and testing but not during the
             deployment (blind testing). LASSO and ElasticNet have relatively similar
             performances during training, testing, and blind testing. Among the six
                                              2
             models, ANN performs the best with R of 0.85 during training and testing
             and 0.84 during the blind testing, whereas LASSO and ElasticNet exhibit the
                                  2
             worst performance with R of 0.76 during the blind testing. Cross validation
             was performed to ensure the model is trained and tested on all the statistical
             features present in the dataset, which is crucial for the robustness of the
             shallow-learning models. As shown in Table 5.2, when the trained models
             are deployed in Well 2, all models exhibit slight decrease in prediction
             accuracy. ANN has the best performance during the deployment stage in
             Well 2. The accuracy of the DTC and DTS logs synthesized using ANN
             model is shown in Fig. 5.7, where the measured and synthesized sonic logs
             are compared across randomly selected 300 depth samples from Well 2 for
             the purpose of blind testing, that is, no data from Well 2 was used for
             training the model.
   168   169   170   171   172   173   174   175   176   177   178