Page 62 - Machine Learning for Subsurface Characterization

P. 62

48 Machine learning for subsurface characterization

Few advantages of DBSCAN are as follows:
1. It is suitable for clusters with arbitrary shapes that are not spherical, convex,
well separated, and compact.
2. It is robust to noise and outliers.
3. Unlike K-means, DBSCAN infers the optimal number of clusters from the
dataset.
4. It is suited for detecting high-density regions and separating them from low-
density regions.
5. Compared with K-means and hierarchical clustering, DBSCAN approach is
closer to human intuition-based approach to clustering.
Few disadvantages of DBSCAN are as follows:

1. Like K-means and hierarchical clustering, DBSCAN is not suited for high-
dimensional large-sized datasets due to the numerous distance calculations
required for the assessment of density.
2. It is computationally expensive.
3. It is not suited for datasets that have clusters of large differences in density.
4. Like K-means and hierarchical clustering, DBSCAN requires feature
scaling and dimensionality reduction.
5. A user needs to carefully select the optimal values of bandwidth and nmin.
Choosing these values correctly is important for the performance of the
algorithm. Sometimes, domain knowledge is needed to select good
values for bandwidth and nmin.

5 Features/attributes for the proposed noninvasive
visualization of geomechanical alteration

Features are measurable properties or characteristics that describe a system/
phenomenon. For example, porosity, permeability, and oil saturation can be
considered as features for identifying a good hydrocarbon reservoir. Each
feature should be informative, discriminating, and independent to develop
robust unsupervised methods. An unsupervised clustering method is as good
as the available features and the quality of data. The original data for this
study comprise shear waveforms, which are measurements of amplitude at
certain time steps (Fig. 2.5, top). Each waveform is made of 1375 signal
amplitudes measured at a time step of 40 ns. Sensor is placed opposite to
source in a transducer assembly. When scanning the axial surface,
waveforms are acquired every 1 mm along the diameter; consequently, 133
waveforms are collected by each of the seven transducer assemblies, which
totals to 931 waveforms. The waveform dataset from the axial surface has a
size of 931 samples and dimensionality of 1375. Such high dimensionality is
not conducive for distance-based or density-based clustering. High
dimensionality leads to several adverse effects, also referred as the curse of

57 58 59 60 61 62 63 64 65 66 67