Page 28 - Machine Learning for Subsurface Characterization
P. 28

Unsupervised outlier detection techniques Chapter  1 13


































             FIG. 1.2 Performances of the four unsupervised outlier detection techniques, namely, (A) isolation
             forest with hyperparameters: max_samples ¼ 10, n_estimators ¼ 100, max_features ¼ 2, and con-
             tamination ¼ 0.1; (B) one-class SVM with hyperparameters: nu ¼ 0.2 and gamma ¼ 0.04;
             (C) local outlier factor with hyperparameters: n_neighbors ¼ 10, metric ¼ “minkowiski,” and
             p ¼ 2; and (D) DBSCAN with hyperparameters: eps ¼ 1, min_samples ¼ 5, metric ¼ “minkowiski,”
             and p ¼ 2, on the synthetic two-dimensional dataset containing 25 samples. Red samples (light gray in
             the print version) indicate outliers, and blue samples (dark gray in the print version) indicate inliers.
             All other hyperparameters except those mentioned earlier have default values.



                Comparison of Figs. 1.1 and 1.2 highlights the effects of hyperparameter
             tuning on outlier detection. Choice of hyperparameters can make one method
             perform poorly compared with the other method on the same dataset. Unfortu-
             nately, when using unsupervised ODTs on well logs and subsurface geophysical
             data, there is no information about the degree of contamination, outlier fraction,
             bandwidth, or any other hyperparameter. For high-dimensional subsurface data-
             set, it is a challenge to visualize the data in entirety and identify hyper-
             parameters suited for a dataset. Generally, an unsupervised ODT needs to be
             applied on the dataset without any hyperparameter tuning and without any prior
             information of the hyperparameters. The primary motivation of our study is to
             identify the best-performing unsupervised ODT method that needs minimal
             hyperparameter tuning.
   23   24   25   26   27   28   29   30   31   32   33