Page 173 - Machine Learning for Subsurface Characterization
P. 173
Robust geomechanical characterization Chapter 5 147
intractably heavy. Another benefit of the two-step clustering method is noise
reduction. The prototypes constructed by SOM are local averages of the data
and, therefore, less sensitive to random variations than the original data. In our
study the SOM has a dimension of 50 neurons by 50 neurons, upon which the
8481 samples from Well 1 will be mapped. All “easy-to-acquire” logs were
fed as inputs to the SOM model. The weights of SOM were randomly
initialized. During training the weight vectors are updated based on the
similarity between the weight vectors and input vectors, which results in
moving the SOM neurons/nodes closer to certain dense regions of the original
data. The similarity between data points and SOM nodes during the weight
update is evaluated based on Euclidean distance. The result of the two-step
SOM followed by K-Means clustering does not have a strong correlation with
the relative error in log synthesis, as shown in Figs. 5.5D and Fig. 5.8.
3 Results
3.1 Prediction performances of shallow-learning regression models
The six shallow-learning regression models discussed in the earlier section were
trained and tested on 8481 data points (distinct depths) acquired from the 4240-
ft depth interval in Well 1 and deployed for blind testing on 2920 data points
acquired from 1460-ft depth interval in Well 2. Model training was on 80%
of randomly selected data from Well 1, and model testing was on remaining
dataset. The data-driven model is trained to capture the hidden relationships
of the 13 “easy-to-acquire” logs (features) with the DTC and DTS sonic logs
(targets). The performance of log synthesis is evaluated in terms of the
2
coefficient of determination, R . The log synthesis results for Wells 1 and 2
are shown in Table 5.2. The log-synthesis performances for DTC and DTS
in Wells 1 and 2 in terms of R 2 are illustrated in Fig. 5.6. OLS and PLS
exhibit similar performances during training and testing but not during the
deployment (blind testing). LASSO and ElasticNet have relatively similar
performances during training, testing, and blind testing. Among the six
2
models, ANN performs the best with R of 0.85 during training and testing
and 0.84 during the blind testing, whereas LASSO and ElasticNet exhibit the
2
worst performance with R of 0.76 during the blind testing. Cross validation
was performed to ensure the model is trained and tested on all the statistical
features present in the dataset, which is crucial for the robustness of the
shallow-learning models. As shown in Table 5.2, when the trained models
are deployed in Well 2, all models exhibit slight decrease in prediction
accuracy. ANN has the best performance during the deployment stage in
Well 2. The accuracy of the DTC and DTS logs synthesized using ANN
model is shown in Fig. 5.7, where the measured and synthesized sonic logs
are compared across randomly selected 300 depth samples from Well 2 for
the purpose of blind testing, that is, no data from Well 2 was used for
training the model.