Page 206 - Machine Learning for Subsurface Characterization

P. 206

176 Machine learning for subsurface characterization

finds k cluster centers that minimizes the total Euclidean distances between
cluster centers and the datapoints belonging to the corresponding clusters.
The number of clusters k is specified prior to clustering the dataset;
consequently, K-means benefits from using elbow method to determine the
optimum number of clusters. With k equalto4,weassignan integer-
valued cluster number ranging from 1 to 4 to each depth in the shale
formation under investigation. KC-index is a compilation of the cluster
numbers. Each of the four clusters has an associated cluster center that
determines the EOR potential of the cluster, which range from low, low-
intermediate, high-intermediate to high EOR potentials.

5.2 Calculation of the KC-index

NMR-derived permeability, porosity, water saturation, oil saturation, NMR-
derived bound-fluid porosity, and an approximation of pore aperture radius
corresponding to the 35th percentile mercury saturation (r35) are used as
inputs (features) to generate the KC-index. r35 characterizes the flow
capacity of the rock and is generated using a Winland-type equation:
logr 35 ¼ 0:732 + 0:588logK a 0:864logΦ (6.7)
For a low-dimensional feature space, correlations among the input logs do
not significantly affect the clustering results because the clustering is based
on the similarity between various samples in the dataset. However, increase
in number of input logs (features) leads to the curse of dimensionality that
adversely affects the K-means clustering because, in high-dimensional
feature space, several data points have similar Euclidean distances from the
cluster centers. As a result, the use of Euclidean distance or any other
formulation of distance to quantify similarity of samples to the cluster centers
for grouping the data into clusters becomes ambiguous and breaks down in a
high-dimensional feature space. The center point of each cluster generated by
K-means method represents the average of the log responses (feature values)
of the samples in that cluster. The cluster centers of the four clusters (shown
in Table 6.2) are physically consistent with the EOR potential represented by
the corresponding clusters, that is, high, low, high-intermediate, and low-
intermediate recovery potentials. Clustering assigns a cluster number to each
depth that can be then used to identify the oil-recovery potential of light-
hydrocarbon injection for the given depth.
Cluster 4 represents depths where light-hydrocarbon injection will yield the
best displacement efficiency because of the high permeability and high oil
saturation, whereas Cluster 1 represents depths of low EOR potential
because of low permeability and low oil saturation. The water saturation and
oil saturation do not sum to 1 because the cluster centers do not represent
the real data. As listed in Table 6.2, Cluster 4 has relatively higher oil
porosity, permeability, r35, lower water porosity, and lower bound-fluid
porosity compared with other clusters. According to Fig. 6.2, there are very

201 202 203 204 205 206 207 208 209 210 211