Page 41 - Machine Learning for Subsurface Characterization

P. 41

26 Machine learning for subsurface characterization

5 Performance of unsupervised ODTs on the four validation
datasets

In this section, the performances of the four unsupervised ODTs (IF, OCSVM,
LOF, and DBSCAN) are evaluated by comparing the unsupervised detections
against the known labels in the validation datasets. Performance of each model
is expressed in terms of balanced accuracy score, F1 score, and ROC-AUC
score (AUC for the ROC curve). Balanced accuracy score is high when large
portion of actual outliers and inliers in data are accurately detected as outliers
and inliers, respectively. For purpose of outlier detection, good F1 score indi-
cates good recall and good precision, indicating that a large fraction of actual
outliers in data are accurately detected as outliers and a large fraction of the
detected outliers are originally outliers and not inliers. A high ROC-AUC score
close to 1 indicates that a large portion of actual outliers and inliers are correctly
detected without much sensitivity to the decision thresholds of the unsupervised
ODT model. These three metrics are simple evaluation metrics. For more robust
assessment, these evaluation metrics should be appropriately weighted to
address the effects of outlier-inlier imbalance (i.e., the number of positives/out-
liers is much smaller than the number of negatives/inliers). Appendix B presents
true positives, true negatives, false positives, and false negatives for certain
unsupervised ODTs on certain datasets in the form of confusion matrix.
Appendix C lists the values of hyperparameters of various models for the unsu-
pervised outlier detection. Because our goal is to find the most reliable unsu-
pervised outlier-detection method, these hyperparameters are not tuned/
modified and the other hyperparameters (if any) are set at the default values.
Values mentioned in Appendix C are kept constant for all the numerical exper-
iments on the four datasets. In a real-world scenario, without any labels to com-
pare and evaluate the outlier/inlier detections, the hyperparameters need to be
tuned based on a manual inspection of each of the outliers and inliers.

5.1 Performance on Dataset #1 containing noisy measurements

The unsupervised ODT model performance is evaluated for three feature subsets
referred to as FS1, FS2, and FS2*, where FS1 contains GR, RHOB, and DTC;
FS2 contains GR, RHOB, and logarithm of RT; and FS2* contains GR, RHOB,
and RT. For the subsets FS1 and FS2* of Dataset #1, DBSCAN performs better
than the other models, as indicated by the balanced accuracy score. For the sub-
set FS1 of Dataset #1, the DBSCAN correctly labels 176 of the 200 introduced
noise samples as outliers and 3962 of the 4037 “normal” data points as inliers;
consequently, DBSCAN has a balanced accuracy score and F1 score of 0.93 and
0.78, respectively. For the subset FS2 of Dataset #1, log transform of resistivity
negatively impacts the outlier detection performance. Logarithmic transforma-
tion of resistivity reduces the variability in the feature. On using deep resistivity
(RT) as is (i.e., without logarithmic transformation) in the subset FS2*,

36 37 38 39 40 41 42 43 44 45 46