Page 258 - Machine Learning for Subsurface Characterization
P. 258

222   Machine learning for subsurface characterization


            2 Dataset
            The goal of this study is to generate NMR T2 distribution from other easy-to-
            acquire subsurface information and compare the performance of various models
            in this log-synthesis task. The dataset used for to the comparison contains raw
            and inverted logs acquired from 575 depth points in a vertical well drilled in a
            shale formation. The 300-ft of shale formation of interest for our investigation
            comprises seven distinct geological formations, as shown in Fig. 8.1.
            Formations F1 to F3 are source rock shale, and formations F4 to F7 are
            clay-rich dolomudstone (Fig. 8.1).
               Two distinct types of features/inputs are used in this study, namely, raw logs
            and inverted logs. For purposes of comparison, we separately build the shallow
            and deep models on each of the two types of features. Twelve raw logs used in
            this investigation are five array-induction resistivity log at various depths
            of investigation (AF10, AF20, AF30, AF60, and AF90), caliper (DCAL),
            compressional sonic (DTCO), shear sonic (DTSM), gamma ray (GR), neutron
            porosity (NPOR), PEFZ, and formation density (RHOZ) logs. Ten inverted
            logs used in this investigation are anhydrite, calcite, chlorite, dolomite, illite,
            K-feldspar, quartz, free water, oil, and bound-water logs. The inverted logs
            are computed by data inversion of the raw resistivity, neutron, density, and
            gamma ray logs. Inverted logs can be considered as specially engineered
            features to facilitate the model training. In comparison to raw logs, when
            using raw logs, deeper networks are required for the same learning tasks,
            which demands more computation time and memory. However, the raw logs
            benefit from the fact that no preprocessing infrastructure is required. Inverted
            composition logs were used in NMR T2 synthesis [15, 16].

            3 Shallow-learning models
            Six shallow-learning models were trained using supervised learning to synthesize
            the64discreteT2amplitudesthatconstitutetheentireNMRT2distribution.These
            regression-type models include ordinary least squares (OLS), least absolute
            shrinkage and selection operator (LASSO), and ElasticNet that are simple linear
            regression models. Support vector regression (SVR), an extension of the
            support vector classifier (SVC), was used as the fourth regression model. SVR
            model can only predict one target at a time. Fifth model is the k-nearest
            neighbor regressor (kNNR), which is an extension of the kNN classifier, can
            simultaneously predict multiple targets. The sixth model is a simple neural
            network based on multilayer perceptron for simultaneous prediction of multiple
            targets.

            3.1 Ordinary least squares

            OLS is the simplest linear regression model. OLS model is built by minimizing
            the cost/loss function shown in Eq. (8.1) formulated as square of L2 norm
   253   254   255   256   257   258   259   260   261   262   263