Page 22 - Machine Learning for Subsurface Characterization
P. 22
Unsupervised outlier detection techniques Chapter 1 7
many real-world applications, these values are known. For example, in the med-
ical field, there is a good estimate of the fraction of people who contract a cer-
tain rare disease, or in a factory assembly line, there is a good estimate of the
fraction of defective mechanical parts. Unfortunately, when working with well
log and other geophysical dataset, the expected fraction of outliers is not nec-
essarily known a priori because this fraction depends on several factors (oper-
ating conditions during logging, type of formation, sensor physics, etc.). This is
a significant challenge in applying unsupervised ODTs on well-log data and
other geophysical data.
Under unsupervised conditions, accuracy and robustness of the ODT rely on
the values of hyperparameters. Hyperparameters are user-defined parameters
specified prior to applying a data-driven method on a dataset. Hyperparameters
control the learning of the data-driven method and determine the final func-
tional form of the data-driven model. Hyperparameters govern the learning pro-
cess, whereas parameters (weights) are consequence of the learning process.
Choice of hyperparameters can make one unsupervised outlier-detection model
to perform poorly as compared to other outlier-detection models on the same
dataset. Unfortunately, when using unsupervised ODTs on well logs and
subsurface data, there is no prior information about the hyperparameters. Gen-
erally, an unsupervised ODT needs to be applied on the well-log and geophys-
ical dataset without any hyperparameter tuning and without any prior
information of the hyperparameters. The primary motivation of our study is
to identify the best-performing unsupervised ODT method that needs minimal
hyperparameter tuning and manual interventions.
3 Unsupervised outlier detection techniques
In this article, we apply four unsupervised ODTs on well logs to identify the
formation depths that exhibit anomalous or outlier log responses. The ODTs
were used in an unsupervised manner without much hyperparameter tuning.
Each formation depth can be considered as a sample, and the various logs
acquired at a specific depth can be considered as features. Being unsupervised
approach, there is no target or desired outcome for a given set of feature values
(feature vector) of a sample. An unsupervised ODT processes the feature vec-
tors corresponding to the available samples that contain both normal (inlier) and
anomalous (outlier) behavior to identify the depths that exhibit outlier behavior.
Unsupervised ODT are based on distance, density, decision boundary, or affin-
ity, which are used to quantify the relationships among the features governing
the inlier and outlier behavior of samples. In this section, we will introduce four
unsupervised ODTs, namely, isolation forest (IF), one-class SVM (OCSVM),
local outlier factor (LOF), and density-based spatial clustering of applications
with noise (DBSCAN). In this study, all methods are implemented from the
scikit-learn package.