Page 46 - Machine Learning for Subsurface Characterization
P. 46

Unsupervised outlier detection techniques Chapter  1 31



               TABLE 1.3 Performances of the four unsupervised ODTs on Dataset #3

                                      Dataset #3 result
                                    Balanced accuracy  F1       ROC-AUC
                                    score              score    score
               Isolation forest  FS1  0.91             0.81     0.97
                              FS2   0.96               0.69     0.99
                              FS3   0.92               0.84     0.99
                              FS4   0.93               0.83     0.99
               One-class SVM  FS1   0.78               0.57     0.8
                              FS2   0.72               0.47     0.75
                              FS3   0.8                0.61     0.81
                              FS4   0.79               0.6      0.88
               Local outlier  FS1   0.8                0.61     0.86
               factor
                              FS2   0.73               0.24     0.66
                              FS3   0.61               0.34     0.79
                              FS4   0.71               0.34     0.73
               DBSCAN         FS1   0.75               0.95     NA
                              FS2   0.8                0.47     NA
                              FS3   0.66               0.73     NA
                              FS4   0.79               0.73     NA
               Visual representation of the performances in terms of balanced accuracy score is shown in Fig. 1.7C.





             and photoelectric factor (PEF) logs. Offshore dataset was labeled using manual
             inspection, feature thresholding, and DBSCAN followed by manual verification
             of the labels (outliers vs inliers) to create the Dataset #4. Consequently, Dataset
             #4 contains several manually labeled outliers. This comparative study focuses
             on IF, OCSVM, and LOF and evaluates their performances using the ROC-
             AUC score and PR-AUC score. This is a challenging dataset because seven
             logs from the offshore dataset are being simultaneously processed by the unsu-
             pervised methods and then compared with manually verified labels. Increase in
             the number of features increases the dimensionality of the dataset leading to
             underperformance of the data-driven methods. IF and OCSVM perform equally
             well and significantly outperform the LOF method for both the PR-AUC and
   41   42   43   44   45   46   47   48   49   50   51