Page 37 - Machine Learning for Subsurface Characterization
P. 37

22   Machine learning for subsurface characterization


            scenario, these evaluation metrics for classification algorithm cannot be used
            for unsupervised ODT due to the lack of prior information about the outliers
            and inliers.
               Formulations of evaluation metrics for classification methods are based on
            true positive, true negative, false positive, and false negative [17]. For purposes
            of assessing the outlier detection methods, true positive/negative refers to the
            number of outliers/inliers that are correctly detected as the outlier/inlier by
            the unsupervised ODT. On those lines, false positive/negative refers to the num-
            ber of inliers/outliers that are incorrectly detected as the outlier/inlier by the
            unsupervised ODT. For example, when an actual inlier is detected as an outlier,
            it is referred as false positive. Following are some simple evaluation metrics that
            we are using for comparing the performances of unsupervised ODT on the
            labeled validation dataset. Appendix B presents true positives, true negatives,
            false positives, and false negatives for certain unsupervised ODTs on certain
            datasets. It is to be noted that these simple evaluation metrics can be improved
            by weighting the metrics to address the effects of outlier-inlier imbalance; that
            is, the number of positives (outliers) tend to be one order of magnitude smaller
            than the number of negatives (inliers).


            4.4.1 Recall
            Recall (also referred to as sensitivity) is the ratio of true positives to the sum of
            true positives and false negatives (i.e. total number of actual/true outliers). It
            represents the fraction of outliers in dataset correctly detected as outliers. Recall
            is expressed as
                                              TP
                                     Recall ¼                           (1.3)
                                            TP + FN
               Recall is an important metric but should not be used in isolation because it
            does not take into account the performance of the method in inlier detection. For
            example, a high recall close to 1 does not mean that the outlier detection method
            has a great performance because of the possibility of large false positives
            because actual inliers are being detected as outliers. For example, when an unsu-
            pervised ODT detects each data point as an outlier, the recall is 1, but the spec-
            ificity is 0, indicating that the performance of the unsupervised ODT is
            unreliable.


            4.4.2 Specificity
            Specificity is the ratio of true negatives to the sum of true negatives and false
            positives (i.e., total number of inliers). It represents the fraction of correctly
            detected inliers by the unsupervised ODT. Specificity is expressed as
                                                TN
                                   Specificity ¼                        (1.4)
                                              TN + FP
   32   33   34   35   36   37   38   39   40   41   42