Page 22 - Machine Learning for Subsurface Characterization
P. 22

Unsupervised outlier detection techniques Chapter  1 7


             many real-world applications, these values are known. For example, in the med-
             ical field, there is a good estimate of the fraction of people who contract a cer-
             tain rare disease, or in a factory assembly line, there is a good estimate of the
             fraction of defective mechanical parts. Unfortunately, when working with well
             log and other geophysical dataset, the expected fraction of outliers is not nec-
             essarily known a priori because this fraction depends on several factors (oper-
             ating conditions during logging, type of formation, sensor physics, etc.). This is
             a significant challenge in applying unsupervised ODTs on well-log data and
             other geophysical data.
                Under unsupervised conditions, accuracy and robustness of the ODT rely on
             the values of hyperparameters. Hyperparameters are user-defined parameters
             specified prior to applying a data-driven method on a dataset. Hyperparameters
             control the learning of the data-driven method and determine the final func-
             tional form of the data-driven model. Hyperparameters govern the learning pro-
             cess, whereas parameters (weights) are consequence of the learning process.
             Choice of hyperparameters can make one unsupervised outlier-detection model
             to perform poorly as compared to other outlier-detection models on the same
             dataset. Unfortunately, when using unsupervised ODTs on well logs and
             subsurface data, there is no prior information about the hyperparameters. Gen-
             erally, an unsupervised ODT needs to be applied on the well-log and geophys-
             ical dataset without any hyperparameter tuning and without any prior
             information of the hyperparameters. The primary motivation of our study is
             to identify the best-performing unsupervised ODT method that needs minimal
             hyperparameter tuning and manual interventions.



             3  Unsupervised outlier detection techniques
             In this article, we apply four unsupervised ODTs on well logs to identify the
             formation depths that exhibit anomalous or outlier log responses. The ODTs
             were used in an unsupervised manner without much hyperparameter tuning.
             Each formation depth can be considered as a sample, and the various logs
             acquired at a specific depth can be considered as features. Being unsupervised
             approach, there is no target or desired outcome for a given set of feature values
             (feature vector) of a sample. An unsupervised ODT processes the feature vec-
             tors corresponding to the available samples that contain both normal (inlier) and
             anomalous (outlier) behavior to identify the depths that exhibit outlier behavior.
             Unsupervised ODT are based on distance, density, decision boundary, or affin-
             ity, which are used to quantify the relationships among the features governing
             the inlier and outlier behavior of samples. In this section, we will introduce four
             unsupervised ODTs, namely, isolation forest (IF), one-class SVM (OCSVM),
             local outlier factor (LOF), and density-based spatial clustering of applications
             with noise (DBSCAN). In this study, all methods are implemented from the
             scikit-learn package.
   17   18   19   20   21   22   23   24   25   26   27