Page 94 - Machine Learning for Subsurface Characterization

P. 94

78 Machine learning for subsurface characterization

data-driven model. Min-max scaler is well suited when the feature distribution
is non-Gaussian in nature and the feature follows a strict bound (e.g., image
pixels). Depths exhibiting outlier log responses need to be removed prior to
min-max scaling, which is drastically influenced by outliers. Moreover, the
existence of outlier log responses adversely affects the weights and biases of
the neurons learnt during the training of an ANN model. In this study, those
depths are considered as outliers, where one of the logs has abnormally large
or small value as compared with the general trend of the log. At some depths
of the shale system under investigation, the log responses are unrealistic and
abnormally high; for example, Gamma ray larger than 1000 API unit or DTSM
larger than 800 μs/ft. Such outliers are referred as global outliers, which require
simple thresholding technique for detection, followed by the removal of such
depths exhibiting outlier log responses.
After the removal of depths exhibiting abnormal log responses, each log
(feature or target) is transformed to a value between 1 and 1 using min-
max scaler. Min-max scaling forces features and targets to lie in the same range,
which guarantees stable convergence of weights and biases in the ANN model
[11]. Min-max scaling was performed using the following equation:

x x min
y ¼ 2 1 (3.9)
x max x min
where x is the original (unscaled) value of the feature or target and y is the scaled
value of x. Scaling is performed for all the features so that all the features have
the same influence when training the model. Scaling is essential when using
distance, density, and gradient-based learning methods. ANN rely on gradients
for updating the weights. For ANN models, unscaled features can result in a
slow or unstable learning process, whereas unscaled targets can result in explod-
ing gradients causing the learning process to fail. Unscaled features are spread
out over orders of magnitude resulting in a model that may learn large-valued
weights. When using traditional backpropagation with sigmoid activation func-
tion, unscaled features can saturate the sigmoid derivative during the training.
Such a model is unstable exhibiting poor generalization performance. However,
when certain variations of backpropagation, such as resilient backpropagation,
are used to estimate the weights of the neural network, the neural network is
more stable to unscaled features because the algorithm uses the sign of the gra-
dient and not the magnitude of gradient when updating the weights.

2.7 Training and testing methodology for the ANN models
After the feature scaling, the dataset (comprising features and targets) is split
into two parts: training data and testing data. Usually, 80% of data are selected
as the training data, and the remaining 20% of the original data constitute the
testing data. When the size of the dataset available for building a data-driven
model increases, we can choose larger percentage of data to constitute the

89 90 91 92 93 94 95 96 97 98 99