Page 66 - Machine Learning for Subsurface Characterization
P. 66

52  Machine learning for subsurface characterization


            stopping criteria is met when the extracted IMF has a small amplitude or when it
            becomes monotonic [21]. The drawbacks of the original EMD algorithm include
            mode mixing (spread of one scale over different IMFs), aliasing (overlapping of
            IMF functions caused due to insufficient sampling rate), and generation of false
            modes [22]. Improved EMD techniques have been proposed, namely, ensemble
            EMD (EEMD) and complete ensemble EMD with adaptive noise (CEEMDAN).
            The improved versions circumvent the problem of mode mixing; however,
            the generation of spurious IMFs is not uncommon [24]. EMD-based methods
            have been extensively applied to nondestructive evaluation. EMD-based
            methods are more computationally expensive compared with STFT or CWT-
            based methods. Its application to seismic data is limited. Nonetheless,
            there have been attempts to apply EMD-based method for seismic attribute
            analysis [23,24].
               In our study, we implement short-time Fourier transform (STFT) on the
            waveforms as the feature engineering method. Applying FFT over a long
            time window does not reveal the spectral content change with time. To avoid
            this problem the FFT is applied over short periods of time. For time
            windows short enough, nonstationary signals can be considered stationary.
            Short-time Fourier transform (STFT) is a powerful tool for audio signal
            processing [25]. STFT is used widely in machine learning assisted speech
            recognition, music analysis, and automatic transcription of audio [25–27].To
            generate the STFT, the first step is to define an analysis window and
            windowing function to generate segments. FFT is applied on the generated
            segments to obtain the short-time Fourier transform [26]. Owing to its
            capabilities in handling nonstationary time series, STFT has been extensively
            applied to monitor seismicity associated with volcanic activity [7, 8] and
            seismicity associated with rock stability [9]. Fig. 2.5 shows an example of
            shear wave used in present study and the corresponding STFT-based
            spectrogram. The coefficients of a spectrogram express the time-frequency
            variations and are used as features. Each raw waveform is transformed using
            STFT to generate a spectrogram having 12 time steps and 15 frequency
            steps. Each time step is 5 μs, and each frequency step is 100 kHz. Hence, by
            implementing STFT as the feature engineering method, 180 features are
            derived from the raw shear waveforms. Robust scaling was then used for
            scaling the features prior to clustering.



            5.2 Dimensionality reduction
            After feature engineering, we perform dimensionality reduction (Step 5 in the
            workflow shown in Fig. 2.2) on the newly derived feature set to obtain fewer
            informative and nonredundant features that facilitate the subsequent learning
            and generalization steps. Dimensionality reduction is essential for the
            processing the engineered dataset represented in terms of 180 STFT-derived
   61   62   63   64   65   66   67   68   69   70   71