Page 62 - Machine Learning for Subsurface Characterization
P. 62

48  Machine learning for subsurface characterization


               Few advantages of DBSCAN are as follows:
            1. It is suitable for clusters with arbitrary shapes that are not spherical, convex,
               well separated, and compact.
            2. It is robust to noise and outliers.
            3. Unlike K-means, DBSCAN infers the optimal number of clusters from the
               dataset.
            4. It is suited for detecting high-density regions and separating them from low-
               density regions.
            5. Compared with K-means and hierarchical clustering, DBSCAN approach is
               closer to human intuition-based approach to clustering.
            Few disadvantages of DBSCAN are as follows:

            1. Like K-means and hierarchical clustering, DBSCAN is not suited for high-
               dimensional large-sized datasets due to the numerous distance calculations
               required for the assessment of density.
            2. It is computationally expensive.
            3. It is not suited for datasets that have clusters of large differences in density.
            4. Like K-means and hierarchical clustering, DBSCAN requires feature
               scaling and dimensionality reduction.
            5. A user needs to carefully select the optimal values of bandwidth and nmin.
               Choosing these values correctly is important for the performance of the
               algorithm. Sometimes, domain knowledge is needed to select good
               values for bandwidth and nmin.


            5 Features/attributes for the proposed noninvasive
            visualization of geomechanical alteration

            Features are measurable properties or characteristics that describe a system/
            phenomenon. For example, porosity, permeability, and oil saturation can be
            considered as features for identifying a good hydrocarbon reservoir. Each
            feature should be informative, discriminating, and independent to develop
            robust unsupervised methods. An unsupervised clustering method is as good
            as the available features and the quality of data. The original data for this
            study comprise shear waveforms, which are measurements of amplitude at
            certain time steps (Fig. 2.5, top). Each waveform is made of 1375 signal
            amplitudes measured at a time step of 40 ns. Sensor is placed opposite to
            source in a transducer assembly. When scanning the axial surface,
            waveforms are acquired every 1 mm along the diameter; consequently, 133
            waveforms are collected by each of the seven transducer assemblies, which
            totals to 931 waveforms. The waveform dataset from the axial surface has a
            size of 931 samples and dimensionality of 1375. Such high dimensionality is
            not conducive for distance-based or density-based clustering. High
            dimensionality leads to several adverse effects, also referred as the curse of
   57   58   59   60   61   62   63   64   65   66   67