Page 47 - Machine Learning for Subsurface Characterization
P. 47

32   Machine learning for subsurface characterization



              TABLE 1.4 Performances of the four unsupervised ODTs on Dataset #4

                                      Dataset #4 result
                                     PR-AUC score        ROC-AUC score
              Isolation forest       0.89                0.98
              One-class SVM          0.88                0.99
              Local outlier factor   0.37                0.73
              Visual representation of the performances in terms of PR-AUC and ROC-AUC scores is shown in
              Fig. 1.7D.




            ROC-AUC scores. ROC-AUC score for LOF indicates a marginal performance
            (not a poor performance); however, ROC only considers recall and specificity
            without accounting for the precision. In this case, PR-AUC score of LOF indi-
            cate a very poor performance where precision is considered. Low PR-AUC
            score of LOF indicates the outlier detection cannot be trusted because of the
            high fraction of original inliers getting wrongly detected as outliers. PR-
            AUC and ROC-AUC curves and scores should be analyzed together for the best
            assessment of unsupervised ODTs. Visual representation of the performances in
            terms of PR-AUC and ROC-AUC scoresis shown in Fig. 1.7D. ROC curves are
            appropriate when there is no imbalance, whereas precision-recall curves are
            suitable for imbalanced datasets (Table 1.4).



            6 Conclusions
            Four distinct well-log datasets containing outlier/inlier labels were used to per-
            form a comparative study of the performances of four unsupervised outlier
            detection techniques (ODT), namely, isolation forest (IF), one-class SVM
            (OCSVM), local outlier factor (LOF), and DBSCAN. Unsupervised ODTs were
            applied on the dataset without hyperparameter tuning and without any prior
            information about either the inliers or outliers. Simple evaluation metrics
            designed for supervised classification methods, such as balanced accuracy
            score, F1 score, receiver operating characteristics (ROC) curve, precision-recall
            (PR) curve, and area under curve (AUC), were used to evaluate the performance
            of the unsupervised ODTs on the labeled validation dataset containing outlier/
            inlier labels. PR curve, ROC curve, and AUC should be used together to accu-
            rately assess the sensitivity of the performance of the unsupervised ODTs to
            decision thresholds. A robust performance is obtained when AUC is close to
            1 indicating that the unsupervised ODT is not sensitive to decision thresholds.
   42   43   44   45   46   47   48   49   50   51   52