Page 66 - Machine Learning for Subsurface Characterization
P. 66
52 Machine learning for subsurface characterization
stopping criteria is met when the extracted IMF has a small amplitude or when it
becomes monotonic [21]. The drawbacks of the original EMD algorithm include
mode mixing (spread of one scale over different IMFs), aliasing (overlapping of
IMF functions caused due to insufficient sampling rate), and generation of false
modes [22]. Improved EMD techniques have been proposed, namely, ensemble
EMD (EEMD) and complete ensemble EMD with adaptive noise (CEEMDAN).
The improved versions circumvent the problem of mode mixing; however,
the generation of spurious IMFs is not uncommon [24]. EMD-based methods
have been extensively applied to nondestructive evaluation. EMD-based
methods are more computationally expensive compared with STFT or CWT-
based methods. Its application to seismic data is limited. Nonetheless,
there have been attempts to apply EMD-based method for seismic attribute
analysis [23,24].
In our study, we implement short-time Fourier transform (STFT) on the
waveforms as the feature engineering method. Applying FFT over a long
time window does not reveal the spectral content change with time. To avoid
this problem the FFT is applied over short periods of time. For time
windows short enough, nonstationary signals can be considered stationary.
Short-time Fourier transform (STFT) is a powerful tool for audio signal
processing [25]. STFT is used widely in machine learning assisted speech
recognition, music analysis, and automatic transcription of audio [25–27].To
generate the STFT, the first step is to define an analysis window and
windowing function to generate segments. FFT is applied on the generated
segments to obtain the short-time Fourier transform [26]. Owing to its
capabilities in handling nonstationary time series, STFT has been extensively
applied to monitor seismicity associated with volcanic activity [7, 8] and
seismicity associated with rock stability [9]. Fig. 2.5 shows an example of
shear wave used in present study and the corresponding STFT-based
spectrogram. The coefficients of a spectrogram express the time-frequency
variations and are used as features. Each raw waveform is transformed using
STFT to generate a spectrogram having 12 time steps and 15 frequency
steps. Each time step is 5 μs, and each frequency step is 100 kHz. Hence, by
implementing STFT as the feature engineering method, 180 features are
derived from the raw shear waveforms. Robust scaling was then used for
scaling the features prior to clustering.
5.2 Dimensionality reduction
After feature engineering, we perform dimensionality reduction (Step 5 in the
workflow shown in Fig. 2.2) on the newly derived feature set to obtain fewer
informative and nonredundant features that facilitate the subsequent learning
and generalization steps. Dimensionality reduction is essential for the
processing the engineered dataset represented in terms of 180 STFT-derived