Page 63 - Machine Learning for Subsurface Characterization
P. 63
Characterization of fracture-induced geomechanical alterations Chapter 2 49
FIG. 2.5 (Upper panel) Original shear waveform and (lower panel) corresponding spectrogram
obtained by processing the waveform using short-time Fourier transform (STFT).
dimensionality. Few unwanted challenges due to a large number of features
(high dimensionality) are as follows:
l Distance, density, and volume-based machine learning methods fail to find
generalizable data-driven model.
l Computationally expensive to develop and test the data-driven models.
l Large memory requirement to store the high-dimensional dataset.
l More data are required to develop generalizable models.
l Nonuniqueness of the model predictions that are more sensitive to noise.
l Data-driven models and their predictions become harder to interpret and
explain.
l Exploratory data analysis is challenging because of the difficulty in
visualizing the relationships between features and those between the
features and targets.
We use feature engineering followed by dimensionality reduction (Steps 4 and 5
in the workflow shown in Fig. 2.2) to convert the high-dimensional shear-
waveform dataset to a low-dimensional dataset suitable for clustering methods.
5.1 Feature engineering
Feature engineering is the process of using domain knowledge along with
mathematical/statistical transformations to derive new features from the raw