Page 178 - Machine Learning for Subsurface Characterization
P. 178
152 Machine learning for subsurface characterization
correlation between the cluster numbers generated using the K-means
clustering with the relative error of the log synthesis, especially those
synthesized using the ANN model. According to Fig. 5.9, the reliability of
ANN-based synthesis of DTC and DTS logs will be highest for the
formation depths labeled as cluster number 1 by the K-means clustering.
Formation depths assigned a cluster number 2 by the K-means clustering
will have low reliability of ANN-based DTC and DTS log synthesis.
To better visualize the clustering results of each clustering method, we use
t-distributed stochastic neighbor embedding (t-SNE) dimensionality reduction
algorithm to project the 13-dimensional feature space on to a two-dimensional
space. Following that, each sample (i.e., formation depth) projected on to the
two-dimensional space is assigned a color based on the cluster number
generated by the clustering method for the sample. Dimensionality reduction
using t-SNE method enables us to plot each formation depth with 13 log
responses as a point in a two-dimensional space while preserving the high-
dimensional topological relationships of the formation depth with other
neighboring depths and with other depths exhibiting similar log responses.
This visualization technique can help us compare the characteristics of the
various clustering methods implemented in our study.
t-SNE is one of the most effective algorithms for nonlinear dimensionality
reduction. The basic principle is to quantify the similarities between all the
possible pairs of data points in the original, high-dimensional feature space
and construct a probability distribution of the topological similarity present in
the dataset. When projecting the data points into low-dimensional space, the
t-SNE algorithm arranges the data points to achieve a probability distribution
of topological relationships similar to that in the original, high-dimensional
feature space. This is accomplished by minimizing the difference between the
two probability distributions, one for the original high-dimensional space and
the other for the low-dimensional space. If a data point is similar to another in
high-dimensional space, then it is very likely to be picked as a neighbor
in low-dimensional space. To apply t-SNE, we need to define the
hyperparameters, namely perplexity and training step. Perplexity defines the
number of neighbors, and it usually ranges from 5 to 50, and it needs to be
larger for a large dataset. In our study, we tested a range of values and
selected the perplexity and the training steps to be 100 and 5000, respectively.
Fig. 5.10 presents the results of dimensionality reduction using the t-SNE
method. Fig. 5.10 has four sub plots; each sub plot uses the same manifold
from t-SNE but colored with different information, such as relative error
in log synthesis, lithology, and cluster numbers obtained from various
clustering methods. Mathematically, manifold is a continuous geometric
structure. When dealing with high-dimensional data in machine learning,
sometimes, we assume that data can be represented using a low-dimensional
manifold. Each point on the plots represents one specific depth in the
formation. The t-SNE algorithm projects all the input logs for each depth as
a point on the t-SNE plot. Formation depths that are similar to each other