Page 172 - Machine Learning for Subsurface Characterization
P. 172

146    Machine learning for subsurface characterization


            samples, one from each cluster. The merging process forms a hierarchical tree
            of clusters. In Fig. 5.5B, most of the cluster numbers are 0, which indicates the
            hierarchical clustering, finds most of the samples to be similar, and groups most
            of the formation depths into one cluster. The results shown in Figs. 5.5B and 5.8
            demonstrate that the hierarchical cluster algorithm does not do a good job in
            differentiating the formation depths.


            2.5.4 DBSCAN clustering
            DBSCAN is a density-based clustering method. Unlike the K-means clustering
            the DBSCAN method do not need the user to manually define the number of
            clusters. Instead, it requires a user to define the minimum number of
            neighbors to be considered in a cluster and the maximum allowed distance
            between any two points for them to be a part of the same cluster. Within a
            certain user-defined distance around a sample, the DBSCAN will count the
            number of neighbors. When the number of neighbors within the specified
            distance (i.e., data density) exceeds a threshold, DBSCAN will identify that
            group of data points as belonging to one cluster. Based on our extensive
            study, we set minimum number of neighbors as 100 and the range of
            distance as 10. Fig. 5.5C shows that DBSCAN clustering method identifies
            many data points as outliers, which are clustered into cluster number1 and
            most of the formation depths are clustered into cluster number 0 (Fig. 5.8).

            2.5.5 SOM followed by K-means clustering
            Self-organizing map (SOM) is a neural network-based dimensionality reduction
            algorithm generally used to represent a high-dimensional dataset as two-
            dimensional discretized pattern. Reduction in dimensionality is performed
            while retaining the topology of data present in the original feature space. In
            this study, we perform SOM dimensionality reduction followed by K-means
            clustering. The clustering method is basically a K-means clustering
            performed on the mapping generated by SOM. As the first step, artificial
            neural  network  is  trained  to  generate  low-dimensional  discretized
            representation of the data in the original feature space while preserving the
            topological properties; this is achieved through competitive learning. In
            SOM, the vectors that are close in the high-dimensional space also end up
            being mapped to SOM nodes that are close in low-dimensional space.
            K-means can be considered a simplified case of SOM, wherein the nodes
            (centroids) are independent from each other. K-means is highly sensitive to
            the initial positions of the centroids, and it is not suitable for high-
            dimensional dataset. The two-stage procedure for clustering adopted in this
            study first uses SOM to produce the low-dimensional prototypes
            (abstractions) that are then clustered in the second stage using K-means. This
            two-step clustering method reduces the computational time and improves the
            efficiency of K-means clustering. Even with relatively small number of
            samples, many clustering algorithms—especially hierarchical ones—become
   167   168   169   170   171   172   173   174   175   176   177