Page 63 - Machine Learning for Subsurface Characterization
P. 63

Characterization of fracture-induced geomechanical alterations Chapter  2 49





























             FIG. 2.5 (Upper panel) Original shear waveform and (lower panel) corresponding spectrogram
             obtained by processing the waveform using short-time Fourier transform (STFT).


             dimensionality. Few unwanted challenges due to a large number of features
             (high dimensionality) are as follows:
             l Distance, density, and volume-based machine learning methods fail to find
                generalizable data-driven model.
             l Computationally expensive to develop and test the data-driven models.
             l Large memory requirement to store the high-dimensional dataset.
             l More data are required to develop generalizable models.
             l Nonuniqueness of the model predictions that are more sensitive to noise.
             l Data-driven models and their predictions become harder to interpret and
                explain.
             l Exploratory data analysis is challenging because of the difficulty in
                visualizing the relationships between features and those between the
                features and targets.
             We use feature engineering followed by dimensionality reduction (Steps 4 and 5
             in the workflow shown in Fig. 2.2) to convert the high-dimensional shear-
             waveform dataset to a low-dimensional dataset suitable for clustering methods.

             5.1 Feature engineering

             Feature engineering is the process of using domain knowledge along with
             mathematical/statistical transformations to derive new features from the raw
   58   59   60   61   62   63   64   65   66   67   68