Page 43 - Machine Learning for Subsurface Characterization
P. 43

28   Machine learning for subsurface characterization

































            FIG. 1.7 Performance of outlier detection models in terms of balanced accuracy (BA) score for
            various subsets of (A) Dataset #1, (B) Dataset #2, and (C) Dataset #3, and that in terms of (D) ROC-
            AUC and PR-AUC scores for Dataset #4.



            5.2 Performance on Dataset #2 containing measurements affected
            by bad holes

            In the bad-hole dataset, model performance is evaluated for five feature subsets:
            FS1, FS2, FS2**, FS3, and FS4. FS1 contains GR, RHOB, and DTC; FS2 con-
            tains GR, RHOB, and RT; FS2** contains GR, RHOB, and RXO; FS3 contains
            GR, RHOB, DTC, and RT; and FS4 contains GR, RHOB, DTC, and NPHI. In
            each feature set, we have 91 depths (samples) labeled as outliers and 4037
            depths labeled as inliers. Isolation forest (IF) performs better than other methods
            for all the feature sets. DBSCAN and LOF detections are the worst. IF perfor-
            mance for FS2 is worse compared with other feature subsets, because FS2 uses
            RT, which is a deep-sensing log and is not much affected by the bad holes. Con-
            sequently, when RT (deep resistivity) is replaced with RXO (shallow resistiv-
            ity) in subset FS2**, the IF performance significantly improves indicating the
            need of shallow-sensing logs for better detection of depths where logs are
            adversely affected by bad holes. Subset FS3 is created by adding DTC (sonic)
            to FS2. FS3 has four features, such that DTC is extremely sensitive to the effects
            of bad holes, whereas RT is not sensitive. In doing so, the performance of IF
   38   39   40   41   42   43   44   45   46   47   48