Page 159 - Machine Learning for Subsurface Characterization
P. 159
134 Machine learning for subsurface characterization
where
n
2
X
RSS j ¼ y pi, j y mi, j (5.3)
i¼1
and
n
X 2
TSS j ¼ y pi, j y j (5.4)
i¼1
where n is the total number of depths for which DTC and DTS logs need to be
synthesized; j ¼ 1 indicates the DTC log and j ¼ 2 indicates the DTS log;
i represents a specific depth; and y pi,j is the sonic log j predicted at depth i,
y mi,j is the sonic log j measured at depth i, and y is the mean of sonic log j
j
measured at all depths in the training or testing dataset. RSS j is the sum of
squares of the residuals, and TSS j is the total sum of squares proportional to
the variance of the corresponding sonic log j.
2.4 Shallow-learning regression models
Six shallow-learning models synthesize DTS and DTC logs by processing 13
“easy-to-acquire” logs. The trained shallow-learning model captures the
hidden relationships between the 13 “easy-to-acquire” logs and the 2 DTC
and DTS sonic logs. Machine learning workflow involves the following
steps in chronological order:
1. Identify targets and features in the dataset.
2. Split the dataset into training and testing datasets. Perform data
preprocessing of training dataset; following that, perform data
preprocessing of testing dataset; finally, perform data preprocessing of
the new dataset on which the models need to be deployed.
3. Select a set of hyperparameters for the machine learning model.
4. Train the model on the training dataset. Continuously monitor the
performance metric to evaluate the memorization error of the model.
5. Stop training the model after the model performance crosses a certain
threshold.
6. After the training, test the model on the testing dataset and compute the
generalization error of the model.
7. Compare the model performance on training dataset (memorization error)
against that on testing dataset (generalization error).
8. Repeat the steps 4, 5, 6, and 7 with another set of hyperparameter till
the memorization error (training error), generalization error (testing
error), and the difference between these two errors are below certain
thresholds.
In our study the DTS and DTC logs are the targets, and the 13 other “easy-to-
acquire” logs are the features. The shallow-learning models will learn to