Page 206 - Machine Learning for Subsurface Characterization
P. 206

176   Machine learning for subsurface characterization


            finds k cluster centers that minimizes the total Euclidean distances between
            cluster centers and the datapoints belonging to the corresponding clusters.
            The number of clusters k is specified prior to clustering the dataset;
            consequently, K-means benefits from using elbow method to determine the
            optimum number of clusters. With k equalto4,weassignan integer-
            valued cluster number ranging from 1 to 4 to each depth in the shale
            formation under investigation. KC-index is a compilation of the cluster
            numbers. Each of the four clusters has an associated cluster center that
            determines the EOR potential of the cluster, which range from low, low-
            intermediate, high-intermediate to high EOR potentials.


            5.2 Calculation of the KC-index

            NMR-derived permeability, porosity, water saturation, oil saturation, NMR-
            derived bound-fluid porosity, and an approximation of pore aperture radius
            corresponding to the 35th percentile mercury saturation (r35) are used as
            inputs (features) to generate the KC-index. r35 characterizes the flow
            capacity of the rock and is generated using a Winland-type equation:
                           logr 35 ¼ 0:732 + 0:588logK a  0:864logΦ     (6.7)
               For a low-dimensional feature space, correlations among the input logs do
            not significantly affect the clustering results because the clustering is based
            on the similarity between various samples in the dataset. However, increase
            in number of input logs (features) leads to the curse of dimensionality that
            adversely affects the K-means clustering because, in high-dimensional
            feature space, several data points have similar Euclidean distances from the
            cluster centers. As a result, the use of Euclidean distance or any other
            formulation of distance to quantify similarity of samples to the cluster centers
            for grouping the data into clusters becomes ambiguous and breaks down in a
            high-dimensional feature space. The center point of each cluster generated by
            K-means method represents the average of the log responses (feature values)
            of the samples in that cluster. The cluster centers of the four clusters (shown
            in Table 6.2) are physically consistent with the EOR potential represented by
            the corresponding clusters, that is, high, low, high-intermediate, and low-
            intermediate recovery potentials. Clustering assigns a cluster number to each
            depth that can be then used to identify the oil-recovery potential of light-
            hydrocarbon injection for the given depth.
               Cluster 4 represents depths where light-hydrocarbon injection will yield the
            best displacement efficiency because of the high permeability and high oil
            saturation, whereas Cluster 1 represents depths of low EOR potential
            because of low permeability and low oil saturation. The water saturation and
            oil saturation do not sum to 1 because the cluster centers do not represent
            the real data. As listed in Table 6.2, Cluster 4 has relatively higher oil
            porosity, permeability, r35, lower water porosity, and lower bound-fluid
            porosity compared with other clusters. According to Fig. 6.2, there are very
   201   202   203   204   205   206   207   208   209   210   211