Page 17 - Machine Learning for Subsurface Characterization
P. 17
2 Machine learning for subsurface characterization
Appendix B Confusion matrix to Appendix D Receiver operating
quantify the inlier characteristics (ROC)
and outlier detections and precision-recall (PR)
by the unsupervised curves for various
ODTs 34 unsupervised ODTs
Appendix C Values of important on the Dataset #1 34
hyperparameters Acknowledgments 36
of the unsupervised References 36
ODT models 34
1 Introduction
From a statistical standpoint, outliers are data points (samples) that are signif-
icantly different from the general trend of the dataset. From a conceptual
standpoint, a sample is considered as an outlier when it does not represent
the behavior of the phenomenon/process as represented by most of the sam-
ples in a dataset. Outliers are indicative of issues in data collection/measure-
ment procedure or unexpected events in the operation/process that generated
the data. Detection and removal of outliers is an important step prior to build-
ing a robust data-driven (DD) and machine learning-based (ML) model. Out-
liers skew the descriptive statistics usedbydataanalysis, data-drivenand
machine learning methods to build the data-driven model. A model developed
on data containing outliers will not accurately represent the normal
behavior of data because the model picks the unrepresentative patterns exhib-
ited by the outliers. As a result, there will be nonuniqueness in the model pre-
dictions. Data-driven models affected by outliers have lower predictive
accuracy and generalization capability.
Outlierhandlingreferstoall thestepstakentonegatetheadverse effectsofout-
liers in a dataset. After detecting the outliers in a dataset, how they are handled
depends on the immediate use of the dataset. Outliers can be removed, replaced,
ortransformeddependingonthe typeofdatasetanditsuse.Outlier handlingispar-
ticularlyimportant asoutlierscould enhance ormaskrelevantstatistical character-
istics of the dataset. For instance, outliers in weather data could be early signs of a
weather disaster; ignoring this could have catastrophic consequences. However,
before considering outlier handling, we must first detect them.
Outliers in well logs and other borehole-based subsurface measurements
occur due to wellbore conditions, logging tool deployment, and physical
characteristics of the geological formations. For example, washed out zones
in the wellbore and borehole rugosity significantly affects the readings of
shallow-sensing logs, such as density, sonic, and photoelectric factor (PEF)
logs, resulting in outlier response. Along with wellbore conditions, uncommon
beds and sudden change in physical/chemical properties at a certain depth in a
formation also result in outlier behavior of the subsurface measurements. In
this chapter, we perform a comparative study of the performances of four