Page 168 - Machine Learning for Subsurface Characterization
P. 168

142    Machine learning for subsurface characterization


            centers are shifted till the distortion/inertia metric converges, that is, with
            further iteration to find the best clusters, the cluster centers do not shift a lot.
            Gaussian mixture model (GMM) assumes the clusters in the dataset are
            generated  based  on  Gaussian  processes.  The  data  points  in  the
            multidimensional feature space are fitted to multivariate normal distributions
            that maximize the posterior probability of the distribution given the data.
            Hierarchical clustering model clusters dataset by repeatedly merging
            (agglomerative) or splitting (divisive) data based on certain similarities to
            generate a hierarchy of clusters. For example, agglomerative hierarchical
            clustering using Euclidian distance as a measure of similarity repeatedly
            executes the following two steps: (1) Identify the two clusters that are closest
            to each other, and (2) merge the two closest clusters with an assumption that
            the proximity of clusters indicates similarity of the clusters. This continues
            until all the clusters are merged together. DBSCAN clusters the data points
            based on the density of the data points. The algorithm puts data points with a
            lot of neighbors into similar groups and recognizes points with fewer
            neighbors as outliers. DBSCAN needs the user to define the minimum number
            of points that are required to form the cluster and the maximum distance
            between two points required for the two points to be part of the same cluster.
            The fifth clustering technique, SOM, utilizes neural network for unsupervised
            dimensionality reduction by projecting the high-dimensional data onto two-
            dimensional space while maintaining the original similarity between the data
            points. Here, we first apply SOM projection, then use K-means to cluster the
            dimensionality-reduced data in the lower-dimensional feature space into groups.
               We first applied the five clustering techniques on all the “easy-to-acquire”
            logs (features). The clusters so obtained did not exhibit any correlation with
            the performances of the shallow-learning regression models for the synthesis
            of DTS and DTC logs. Clustering methods that use Euclidian distance, for
            example, K-means and DBSCAN, perform poorly in high-dimensional feature
            space due to the curse of dimensionality. High dimensionality and high
            nonlinearity when using all the 13 “easy-to-acquire” logs resulted in complex
            relationships among the features that were challenging for the clustering
            algorithms to resolve into reliable clusters. In order to avoid the curse of
            dimensionality, only three “easy-to-acquire” logs, namely, DPHZ, NPOR, and
            RHOZ, were used for the desired clustering because these logs exhibit good
            correlations with the log synthesis performance of the shallow-learning models
            (Fig. 5.3). We chose these three logs to build the clusters for determining the
            reliability of log synthesis using the shallow-learning models in new wells. For
            the five clustering techniques, we processed the three selected features,
            namely, DPHZ, NPOR, and RHOZ, to generate only three clusters that could
            show potential correlation with the good, intermediate, and bad log-synthesis
            performances, respectively, of the shallow-learning models. The 4240-ft
            formation in Well 1 is clustered into three clusters by processing the DPHZ,
            NPOR, and RHOZ logs. Following that, the averaged cluster numbers of each
            50-ft depth interval and the averaged relative errors in log synthesis for each
   163   164   165   166   167   168   169   170   171   172   173