Page 233 - Machine Learning for Subsurface Characterization
P. 233
Deep neural network architectures Chapter 7 203
networks. The four models were trained/tested on a desktop computer with
3.5 GHz CPU and 32 GB RAM. With the increase in number of training
steps, the model performances on both the training and testing dataset
improve, that is, the prediction accuracies for both dataset increase. When
training a deep neural network model, the model performances on both the
training and testing dataset need to be monitored to identify the optimum
training steps. When the number of training steps is more than the optimal
value, the performance on training loss progressively improves, but the
performance on the testing dataset first stabilizes and then starts reducing.
This indicates that the training of the model beyond the optimal training
steps results in overfitting the training dataset and reduction in the model
generalization capability. The training steps are selected by monitoring the
mean squared error (MSE) of the learning on both training and testing
datasets. In the initial training stage, MSEs of the model on the training and
testing dataset decrease as the model learns from the training dataset. After a
certain number of steps of training, the testing MSE starts increasing; this is
the point where the model starts overfitting the training datasets. To avoid
overfitting, we invoke an early stopping criterion based on the performances
on the training and testing dataset.
For purposes of evaluating the model performances, we use the coefficient
2
of determination, R , for each depth or averaged over all the depths of the testing
2
dataset. R as a performance metric has limitations and assumptions that need to
2
considered prior to the implementation. R for a specific depth is calculated
using the predicted NMR T2 and measured NMR T2 distributions, such that
X n 2
ð y i ^ y i Þ
2 i¼1
R y, ^ yð Þ ¼ 1 X n 2 (7.3)
ð y i yÞ
i¼1
where ^ y and y i are predicted and measured NMR T2 amplitude for bin i, y is
i
mean of measured NMR T2 amplitude for bin i, and n is the total number of bins
in the T2 distribution (i.e., 64).
Each model needs to compute a different number of parameters based on the
model architecture. For example, 4608 and 5234 parameters are computed
when training the generator and discriminator networks, respectively, of the
GAN. Each model requires a different number of training steps and has
different training times depending on the model architecture (see Table 7.1).
For example, it takes 41.25 s for the two-stage training of VAEc-NN to train
1000 times on the training data. LSTM requires more time to train than the
VAEc-NN. It took 566.33 s to train the LSTM model with 1000 training
steps. VAEc-NN model training computes 1825 parameters, whereas the
LSTM model training computes 2071 parameters. Even though the two
models have a similar number of parameters and LSTM has one training
stage as compared with the VAEc-NN, the large difference in the training
times of LSTM and VAEc-NN is due to the complexity of calculations in