Page 17 - Machine Learning for Subsurface Characterization
P. 17

2   Machine learning for subsurface characterization


                Appendix B Confusion matrix to  Appendix D Receiver operating
                         quantify the inlier          characteristics (ROC)
                         and outlier detections       and precision-recall (PR)
                         by the unsupervised          curves for various
                         ODTs            34           unsupervised ODTs
                Appendix C Values of important        on the Dataset #1  34
                         hyperparameters     Acknowledgments           36
                         of the unsupervised  References               36
                         ODT models      34




            1 Introduction
            From a statistical standpoint, outliers are data points (samples) that are signif-
            icantly different from the general trend of the dataset. From a conceptual
            standpoint, a sample is considered as an outlier when it does not represent
            the behavior of the phenomenon/process as represented by most of the sam-
            ples in a dataset. Outliers are indicative of issues in data collection/measure-
            ment procedure or unexpected events in the operation/process that generated
            the data. Detection and removal of outliers is an important step prior to build-
            ing a robust data-driven (DD) and machine learning-based (ML) model. Out-
            liers skew the descriptive statistics usedbydataanalysis, data-drivenand
            machine learning methods to build the data-driven model. A model developed
            on data containing outliers will not accurately represent the normal
            behavior of data because the model picks the unrepresentative patterns exhib-
            ited by the outliers. As a result, there will be nonuniqueness in the model pre-
            dictions. Data-driven models affected by outliers have lower predictive
            accuracy and generalization capability.
               Outlierhandlingreferstoall thestepstakentonegatetheadverse effectsofout-
            liers in a dataset. After detecting the outliers in a dataset, how they are handled
            depends on the immediate use of the dataset. Outliers can be removed, replaced,
            ortransformeddependingonthe typeofdatasetanditsuse.Outlier handlingispar-
            ticularlyimportant asoutlierscould enhance ormaskrelevantstatistical character-
            istics of the dataset. For instance, outliers in weather data could be early signs of a
            weather disaster; ignoring this could have catastrophic consequences. However,
            before considering outlier handling, we must first detect them.
               Outliers in well logs and other borehole-based subsurface measurements
            occur due to wellbore conditions, logging tool deployment, and physical
            characteristics of the geological formations. For example, washed out zones
            in the wellbore and borehole rugosity significantly affects the readings of
            shallow-sensing logs, such as density, sonic, and photoelectric factor (PEF)
            logs, resulting in outlier response. Along with wellbore conditions, uncommon
            beds and sudden change in physical/chemical properties at a certain depth in a
            formation also result in outlier behavior of the subsurface measurements. In
            this chapter, we perform a comparative study of the performances of four
   12   13   14   15   16   17   18   19   20   21   22