Page 178 - Machine Learning for Subsurface Characterization
P. 178

152    Machine learning for subsurface characterization


            correlation between the cluster numbers generated using the K-means
            clustering with the relative error of the log synthesis, especially those
            synthesized using the ANN model. According to Fig. 5.9, the reliability of
            ANN-based synthesis of DTC and DTS logs will be highest for the
            formation depths labeled as cluster number 1 by the K-means clustering.
            Formation depths assigned a cluster number 2 by the K-means clustering
            will have low reliability of ANN-based DTC and DTS log synthesis.
               To better visualize the clustering results of each clustering method, we use
            t-distributed stochastic neighbor embedding (t-SNE) dimensionality reduction
            algorithm to project the 13-dimensional feature space on to a two-dimensional
            space. Following that, each sample (i.e., formation depth) projected on to the
            two-dimensional space is assigned a color based on the cluster number
            generated by the clustering method for the sample. Dimensionality reduction
            using t-SNE method enables us to plot each formation depth with 13 log
            responses as a point in a two-dimensional space while preserving the high-
            dimensional topological relationships of the formation depth with other
            neighboring depths and with other depths exhibiting similar log responses.
            This visualization technique can help us compare the characteristics of the
            various clustering methods implemented in our study.
               t-SNE is one of the most effective algorithms for nonlinear dimensionality
            reduction. The basic principle is to quantify the similarities between all the
            possible pairs of data points in the original, high-dimensional feature space
            and construct a probability distribution of the topological similarity present in
            the dataset. When projecting the data points into low-dimensional space, the
            t-SNE algorithm arranges the data points to achieve a probability distribution
            of topological relationships similar to that in the original, high-dimensional
            feature space. This is accomplished by minimizing the difference between the
            two probability distributions, one for the original high-dimensional space and
            the other for the low-dimensional space. If a data point is similar to another in
            high-dimensional space, then it is very likely to be picked as a neighbor
            in low-dimensional space. To apply t-SNE, we need to define the
            hyperparameters, namely perplexity and training step. Perplexity defines the
            number of neighbors, and it usually ranges from 5 to 50, and it needs to be
            larger for a large dataset. In our study, we tested a range of values and
            selected the perplexity and the training steps to be 100 and 5000, respectively.
               Fig. 5.10 presents the results of dimensionality reduction using the t-SNE
            method. Fig. 5.10 has four sub plots; each sub plot uses the same manifold
            from t-SNE but colored with different information, such as relative error
            in log synthesis, lithology, and cluster numbers obtained from various
            clustering methods. Mathematically, manifold is a continuous geometric
            structure. When dealing with high-dimensional data in machine learning,
            sometimes, we assume that data can be represented using a low-dimensional
            manifold. Each point on the plots represents one specific depth in the
            formation. The t-SNE algorithm projects all the input logs for each depth as
            a point on the t-SNE plot. Formation depths that are similar to each other
   173   174   175   176   177   178   179   180   181   182   183