Page 134 - Machine Learning for Subsurface Characterization
P. 134
110 Machine learning for subsurface characterization
the limitations and assumptions of a metric and accordingly select a combina-
tion of evaluation metrics best suited for a specific predictive/data-driven
modeling task. An evaluation metric quantifies the likelihood of a predictive
model to correctly predict the future outcomes or that of a data-driven model
to correctly quantify the trends/patterns in a dataset. Regression tasks use eval-
uation metrics that are very different from those used for classification tasks [9].
A good evaluation metric should enable clear discrimination among various
models developed on a dataset and should be sensitive to variations in model
performances.
A popular evaluation metric for regression tasks is the coefficient of deter-
2
mination, R , that measures the fraction of variance in the targets that can be
2
explained by a predictive/data-driven model. In simple terms, R measures
how well can the variations in targets/outputs (y) be explained by variations
2
in features/inputs (x) using a certain predictive/data-driven model. R is based
2
on the principle that good models lead to small residuals. R is the square of the
correlation coefficient, r, which measures the strength of linear relationship
2
2
between two variables. Adjusted R is a modification of R to account for
2
the number of features/inputs used in the predictive model. Unlike R , adjusted
2
R gives a low score to a model that uses several noninformative, low-
2
importance inputs/features. Few limitations and assumptions of R are as
follows:
1. It cannot quantify the bias in the model predictions.
2. It only considers the linear relationships between targets and features.
3. It does not account for the nonlinear relationships between the targets and
features unless the targets are appropriately transformed.
2
4. A large R indicates linear association specific to the model and to the data-
set used to develop the model.
2
5. A large R does not mean causation. It is only an indicator of correlation
(association).
6. It over emphasizes large errors versus small errors.
7. It tends to overemphasize errors for samples having large-valued targets
versus those having small-valued targets.
2
8. Though R is scaled between 0 and 1, it is a relative measure and not an
absolute measure because it depends on the number of datapoints, selected
ranges of the features, and the number and order of the features used to
build the model.
9. It does not consider variance in the features.
2
10. As more features are added, R tends to increase even when the newly
added features are not important. This is because of overfitting.
11. It is not suitable when the variance in the target is low, when there are few
samples/datapoints, and when the error in data is large.
2
12. R of models cannot be compared across data sets.
2
Compared with R , mean absolute error (MAE) and root-mean-square error
(RMSE) are better evaluation metrics. MAE is the average magnitude of