Page 31 - Machine Learning for Subsurface Characterization
P. 31

16   Machine learning for subsurface characterization


                                     2                              3
            measurements range from 10  ohm-m (brine-filled formation) to 10 ohm-m
            (low-porosity formation) and tend to exhibit log-normal distribution. To reduce
            the right skewness (i.e., mean ≫ mode) and large variance observed in the resis-
            tivity data relative to other logs (features), we transformed resistivity to its log-
            arithm. This reduces it skewness and variability and improves the model’s
            predictive performance, as demonstrated in subsequent sections.


            4.2.2 Feature scaling: Use of robust scaler
            A dataset generally contains features that significantly differ from each other in
            terms of magnitude, unit, and range. This tends to bias the machine learning
            methods based on distance, volume, density, and gradients [16]. Without fea-
            ture scaling, a few features will dominate during the model development.
            For instance, the features with high magnitudes will weigh in a lot more in
            the distance calculations than features with low magnitudes, which, for exam-
            ple, will adversely affect k-nearest neighbor classification/regression and prin-
            cipal component analysis. Without feature scaling, samples will exhibit high
            density in few feature dimensions and low density in other feature dimensions.
            Feature scaling is an important aspect of data preprocessing that improves the
            performance of the data-driven models. For methods based on distance, volume,
            and density, it is essential to ensure that the features have similar or near similar
            scales for improved performance. For methods based on gradient-descent opti-
            mization and other forms of gradients, it is recommended to use scaled features
            for fast convergence. For neural networks, it is crucial to have the features
            scaled between minimum and maximum values, preferably 0 to 1 or  1to1,
            for fast and robust convergence. Notably, tree-based methods such as random
            forest regression/classification, AdaBoost, and isolation forest do not require
            feature scaling. Data from different logs usually range between different scales.
            For example, the RHOB (1.95–2.95 g/cc), porosity (0.0–0.2 fraction) and GR
            (50–250 gAPI) logs have vastly different scales.
               For purposes of feature scaling, we used the robust scaling method, which
            can be expressed mathematically as

                                          x i  Q 1 xðÞ
                                    x is ¼                              (1.2)
                                        Q 3 xðÞ Q 1 xðÞ
            where x is is the scaled feature x for the ith sample, x i is the unscaled feature x for
            the ith sample, Q 1 (x) is the first quartile of the distribution of feature x, and Q 3 (x)
            is the third quartile of the distribution of feature x. The first and third quartiles
            represent the medians of the lower half and upper half of the data, respectively,
            not influenced by outliers. We perform robust scaling on the features (logs)
            because it overcomes the limitations of other scaling methods, like the Standard
            scaler that assumes the data are normally distributed and the MinMax scaler that
            assumes that the feature cannot exceed certain bounds/limits due to the physics
            or measurement governing the feature. For example, MinMax scaler is suitable
   26   27   28   29   30   31   32   33   34   35   36