Page 171 - Machine Learning for Subsurface Characterization
P. 171

Robust geomechanical characterization Chapter  5 145


             cluster. As a result, GMM is not most suited for well logs. In Fig. 5.5A, the
             GMM-derived cluster number weakly correlates with the relative error with
             a Pearson correlation coefficient of  0.22. As explained above, each cluster
             number plotted in Fig. 5.5 is the averaged cluster number for a 50-ft depth
             interval, whereas each relative error in the log synthesis is the average of
             relative errors for the corresponding 50-ft depth interval. The points scatter
             all over the plot and do not show obvious correlation. Gaussian mixture
             model identifies several clusters by differentiating the formation depths but
             the cluster patterns learned by the GMM model are completely different
             from the patterns of relative errors of shallow-learning models used for log
             synthesis (Fig. 5.8).


             2.5.3 Hierarchical clustering
             Our application of hierarchical clustering algorithm starts with every data point
             as a cluster; then clusters are repeatedly merged together based on their
             similarity (proximity), until the target number of clusters is reached. This
             clustering ends when each cluster is most distinct from other clusters and the
             samples within each cluster are most similar. The similarity of two clusters
             is evaluated based on the sum of squared Euclidean distances for all pairs of





























             FIG. 5.5 Averaged cluster numbers derived using (A) GMM, (B) hierarchical, (C) DBSCAN, and
             (D) SOM clustering versus the averaged relative errors in ANN-based synthesis of DTS and DTC
             logs such that the the averages are computed for each of the 50-ft depth intervals in the 4240-ft depth
             interval of Well 1, where training and testing were done. The gray region indicates 95% confidence
             interval.
   166   167   168   169   170   171   172   173   174   175   176