Page 258 - Machine Learning for Subsurface Characterization
P. 258
222 Machine learning for subsurface characterization
2 Dataset
The goal of this study is to generate NMR T2 distribution from other easy-to-
acquire subsurface information and compare the performance of various models
in this log-synthesis task. The dataset used for to the comparison contains raw
and inverted logs acquired from 575 depth points in a vertical well drilled in a
shale formation. The 300-ft of shale formation of interest for our investigation
comprises seven distinct geological formations, as shown in Fig. 8.1.
Formations F1 to F3 are source rock shale, and formations F4 to F7 are
clay-rich dolomudstone (Fig. 8.1).
Two distinct types of features/inputs are used in this study, namely, raw logs
and inverted logs. For purposes of comparison, we separately build the shallow
and deep models on each of the two types of features. Twelve raw logs used in
this investigation are five array-induction resistivity log at various depths
of investigation (AF10, AF20, AF30, AF60, and AF90), caliper (DCAL),
compressional sonic (DTCO), shear sonic (DTSM), gamma ray (GR), neutron
porosity (NPOR), PEFZ, and formation density (RHOZ) logs. Ten inverted
logs used in this investigation are anhydrite, calcite, chlorite, dolomite, illite,
K-feldspar, quartz, free water, oil, and bound-water logs. The inverted logs
are computed by data inversion of the raw resistivity, neutron, density, and
gamma ray logs. Inverted logs can be considered as specially engineered
features to facilitate the model training. In comparison to raw logs, when
using raw logs, deeper networks are required for the same learning tasks,
which demands more computation time and memory. However, the raw logs
benefit from the fact that no preprocessing infrastructure is required. Inverted
composition logs were used in NMR T2 synthesis [15, 16].
3 Shallow-learning models
Six shallow-learning models were trained using supervised learning to synthesize
the64discreteT2amplitudesthatconstitutetheentireNMRT2distribution.These
regression-type models include ordinary least squares (OLS), least absolute
shrinkage and selection operator (LASSO), and ElasticNet that are simple linear
regression models. Support vector regression (SVR), an extension of the
support vector classifier (SVC), was used as the fourth regression model. SVR
model can only predict one target at a time. Fifth model is the k-nearest
neighbor regressor (kNNR), which is an extension of the kNN classifier, can
simultaneously predict multiple targets. The sixth model is a simple neural
network based on multilayer perceptron for simultaneous prediction of multiple
targets.
3.1 Ordinary least squares
OLS is the simplest linear regression model. OLS model is built by minimizing
the cost/loss function shown in Eq. (8.1) formulated as square of L2 norm