Page 48 - Machine Learning for Subsurface Characterization

P. 48

Unsupervised outlier detection techniques Chapter 1 33

For any specific decision threshold, balanced accuracy score and F1 score
should be used together to evaluate the reliability and precision of the
unsupervised ODT.
DBSCAN is the most effective in detecting noise in data as outliers, while
IF and OCSVM have slightly lower performances in detecting noisy data
points as outliers and lower precisions. DBSCAN, IF, and OCSVM are suit-
able for detecting point outliers, when outliers are scattered around the inlier
zone. None of these methods are suitable when outliers occur as dense regions
in the feature space as collective outliers. Isolation forest exhibits great per-
formance in detecting contextual outliers when there are zones affected by
bad-hole conditions. Isolation forest also proved efficient in detecting outliers
when there is mixture of outliers due to noise (point outlier) and bad-hole con-
ditions (contextual outliers) in the presence of an infrequently occurring but
relevant and distinct subgroup (which should not be considered as outlier due
to its rare occurrence and distinct characteristics). Isolation forest is by far the
most robust and reliable in detecting outliers and inliers in the log data. Per-
formance of unsupervised ODTs depends on selection of features, especially
when detecting contextual outliers, which will require hyperparameter tuning
for optimum performance. For example, shallow-sensing logs improve
the detection of depths where logs are adversely affectedbybadholes.Local
outlier factor is computationally expensive and needs careful hyperparameter
tuning for reliable and robust performance; by far, LOF is the worst-
performing unsupervised ODT.

Appendix A Popular methods for outlier detection
See Fig. 1.A1.

FIG. 1.A1 Popular methods for outlier detection.

43 44 45 46 47 48 49 50 51 52 53