Page 158 - Machine Learning for Subsurface Characterization

P. 158

Robust geomechanical characterization Chapter 5 133

(NPOR), photoelectric factor log (PEFZ), bulk density log (RHOZ), discrete
lithology flags, and laterolog resistivity logs at 6 depths of investigation (RLA0,
RLA1, RLA2, RLA3, RLA4, and RLA5). These logs from Well 1 are shown in
Tracks 2–5of Fig. 5.1. These easy-to-acquire logs are fed into the six shallow-
learning models and the five clustering methods. The 4240-ft depth interval in
Well 1 has 13 distinct lithologies, and the corresponding discrete lithology flag
ranges from 1 to 13 indicating the 13 lithology. DTC and DTS logs (Fig. 5.1,
Track 6) are the outputs of the shallow-learning log synthesis models. From a
machine learning perspective, the 13 “easy-to-acquire” logs will be referred as
features and the 2 DTC and DTS logs being synthesized will be referred as targets.

2.2 Data preprocessing
Data preprocessing aims to facilitate the training/testing process by
appropriately transforming and scaling the entire dataset. Preprocessing is
necessary before training the machine learning models. Preprocessing
removes outliers and scales the features to an equivalent range. We use min-
max scaling that ensures fast convergence of the gradient-based learning
process, especially for neural network models. Min-max scaling is performed
on one feature at a time using the following equation:

y i y min
0
y i ¼ 2 1 (5.1)
y max y min
wherey i istheoriginalvalueofalogresponse(y)andy i isthescaledvalueofthelog
0
response (y) at a depth i. y min and y max are theminimumand maximum valuesof the
log response (y), respectively. Min-max scaling is performed only on the 13 “easy-
to-acquire” logs, which are considered as features for the shallow-learning task of
synthesizingDTSandDTClogs.WedonotscaletheDTSandDTClogs,whichare
the targets for the machine learning task. As mentioned in previous chapters, a
machine learning workflow first learns from the training dataset, then is
evaluated on the testing dataset, and finally deployed on the new dataset. Any
data preprocessing step should adopt the following sequence of steps: (1)
perform data preprocessing on the training dataset; (2) learn the statistical
parameters required for the data preprocessing of the training dataset; and (3)
perform data preprocessing on the testing dataset and new dataset by applying
the statistical parameters learnt from the preprocessing of the training dataset. In
our case, minimum and maximum of each feature (log) is first learnt during the
scaling of training dataset, and then those minimum and maximum values are
usedforscalingthecorrespondingfeaturesinthetestingdatasetandthenewdataset.

2.3 Metric to evaluate the log-synthesis performance
of the shallow-learning regression models

2
The correlation coefficient (R ) is used to compare the prediction performance
of all models, which is formulated as
2
R j ¼ 1 RSS j =TSS j (5.2)

153 154 155 156 157 158 159 160 161 162 163