Page 43 - Machine Learning for Subsurface Characterization

P. 43

28 Machine learning for subsurface characterization

FIG. 1.7 Performance of outlier detection models in terms of balanced accuracy (BA) score for
various subsets of (A) Dataset #1, (B) Dataset #2, and (C) Dataset #3, and that in terms of (D) ROC-
AUC and PR-AUC scores for Dataset #4.

5.2 Performance on Dataset #2 containing measurements affected
by bad holes

In the bad-hole dataset, model performance is evaluated for five feature subsets:
FS1, FS2, FS2**, FS3, and FS4. FS1 contains GR, RHOB, and DTC; FS2 con-
tains GR, RHOB, and RT; FS2** contains GR, RHOB, and RXO; FS3 contains
GR, RHOB, DTC, and RT; and FS4 contains GR, RHOB, DTC, and NPHI. In
each feature set, we have 91 depths (samples) labeled as outliers and 4037
depths labeled as inliers. Isolation forest (IF) performs better than other methods
for all the feature sets. DBSCAN and LOF detections are the worst. IF perfor-
mance for FS2 is worse compared with other feature subsets, because FS2 uses
RT, which is a deep-sensing log and is not much affected by the bad holes. Con-
sequently, when RT (deep resistivity) is replaced with RXO (shallow resistiv-
ity) in subset FS2**, the IF performance significantly improves indicating the
need of shallow-sensing logs for better detection of depths where logs are
adversely affected by bad holes. Subset FS3 is created by adding DTC (sonic)
to FS2. FS3 has four features, such that DTC is extremely sensitive to the effects
of bad holes, whereas RT is not sensitive. In doing so, the performance of IF

38 39 40 41 42 43 44 45 46 47 48