Page 134 - Machine Learning for Subsurface Characterization
P. 134

110 Machine learning for subsurface characterization


            the limitations and assumptions of a metric and accordingly select a combina-
            tion of evaluation metrics best suited for a specific predictive/data-driven
            modeling task. An evaluation metric quantifies the likelihood of a predictive
            model to correctly predict the future outcomes or that of a data-driven model
            to correctly quantify the trends/patterns in a dataset. Regression tasks use eval-
            uation metrics that are very different from those used for classification tasks [9].
            A good evaluation metric should enable clear discrimination among various
            models developed on a dataset and should be sensitive to variations in model
            performances.
               A popular evaluation metric for regression tasks is the coefficient of deter-
                      2
            mination, R , that measures the fraction of variance in the targets that can be
                                                                  2
            explained by a predictive/data-driven model. In simple terms, R measures
            how well can the variations in targets/outputs (y) be explained by variations
                                                                    2
            in features/inputs (x) using a certain predictive/data-driven model. R is based
                                                           2
            on the principle that good models lead to small residuals. R is the square of the
            correlation coefficient, r, which measures the strength of linear relationship
                                                              2
                                          2
            between two variables. Adjusted R is a modification of R to account for
                                                                   2
            the number of features/inputs used in the predictive model. Unlike R , adjusted
              2
            R gives a low score to a model that uses several noninformative, low-
                                                                     2
            importance inputs/features. Few limitations and assumptions of R are as
            follows:
              1. It cannot quantify the bias in the model predictions.
              2. It only considers the linear relationships between targets and features.
              3. It does not account for the nonlinear relationships between the targets and
                features unless the targets are appropriately transformed.
                        2
              4. A large R indicates linear association specific to the model and to the data-
                set used to develop the model.
                        2
              5. A large R does not mean causation. It is only an indicator of correlation
                (association).
              6. It over emphasizes large errors versus small errors.
              7. It tends to overemphasize errors for samples having large-valued targets
                versus those having small-valued targets.
                        2
              8. Though R is scaled between 0 and 1, it is a relative measure and not an
                absolute measure because it depends on the number of datapoints, selected
                ranges of the features, and the number and order of the features used to
                build the model.
              9. It does not consider variance in the features.
                                          2
            10. As more features are added, R tends to increase even when the newly
                added features are not important. This is because of overfitting.
            11. It is not suitable when the variance in the target is low, when there are few
                samples/datapoints, and when the error in data is large.
                  2
            12. R of models cannot be compared across data sets.
                           2
            Compared with R , mean absolute error (MAE) and root-mean-square error
            (RMSE) are better evaluation metrics. MAE is the average magnitude of
   129   130   131   132   133   134   135   136   137   138   139