Page 37 - Machine Learning for Subsurface Characterization
P. 37
22 Machine learning for subsurface characterization
scenario, these evaluation metrics for classification algorithm cannot be used
for unsupervised ODT due to the lack of prior information about the outliers
and inliers.
Formulations of evaluation metrics for classification methods are based on
true positive, true negative, false positive, and false negative [17]. For purposes
of assessing the outlier detection methods, true positive/negative refers to the
number of outliers/inliers that are correctly detected as the outlier/inlier by
the unsupervised ODT. On those lines, false positive/negative refers to the num-
ber of inliers/outliers that are incorrectly detected as the outlier/inlier by the
unsupervised ODT. For example, when an actual inlier is detected as an outlier,
it is referred as false positive. Following are some simple evaluation metrics that
we are using for comparing the performances of unsupervised ODT on the
labeled validation dataset. Appendix B presents true positives, true negatives,
false positives, and false negatives for certain unsupervised ODTs on certain
datasets. It is to be noted that these simple evaluation metrics can be improved
by weighting the metrics to address the effects of outlier-inlier imbalance; that
is, the number of positives (outliers) tend to be one order of magnitude smaller
than the number of negatives (inliers).
4.4.1 Recall
Recall (also referred to as sensitivity) is the ratio of true positives to the sum of
true positives and false negatives (i.e. total number of actual/true outliers). It
represents the fraction of outliers in dataset correctly detected as outliers. Recall
is expressed as
TP
Recall ¼ (1.3)
TP + FN
Recall is an important metric but should not be used in isolation because it
does not take into account the performance of the method in inlier detection. For
example, a high recall close to 1 does not mean that the outlier detection method
has a great performance because of the possibility of large false positives
because actual inliers are being detected as outliers. For example, when an unsu-
pervised ODT detects each data point as an outlier, the recall is 1, but the spec-
ificity is 0, indicating that the performance of the unsupervised ODT is
unreliable.
4.4.2 Specificity
Specificity is the ratio of true negatives to the sum of true negatives and false
positives (i.e., total number of inliers). It represents the fraction of correctly
detected inliers by the unsupervised ODT. Specificity is expressed as
TN
Specificity ¼ (1.4)
TN + FP