Page 176 - Intelligent Digital Oil And Gas Fields
P. 176

Components of Artificial Intelligence and Data Analytics     135


              4.3.2 Data Mining, Multivariate, Root-Cause,
                    and Performance Analysis

              As a statistical discipline, DM has been used for more than half a century, and
              has become more applicable and widely used with the emergence of Big
              Data. Interestingly, one of the fastest tracks in DM development is now seen
              in the areas of social networking, recommendation systems, and online com-
              merce, which are permanently exposed to enormous volumes of generated
              data subject to analysis, interpretation, and decision-making (Leskovec et al.,
              2014; Hallac et al., 2015). In E&P, it seems like more systematic use of DM
              techniques correlates with the diminishing availability of conventional
              hydrocarbon resources and the rise of unconventional reservoirs (e.g., shale
              plays) as the main source of oil and gas.
                 Numerous publications on the use of DM in the oil and gas industry have
              emerged in recent years. Moreover, Bravo et al. (2014) have reported that
              DM ranks among the highest Web-searched term within AIPA technologies.
              This section briefly summarizes a few recent applications that apply nonlinear
              multivariate prediction, classification, and root-cause analysis.
                 Zhong et al. (2015) and Gao and Gao (2013) have cross-evaluated and
              compared standard univariate linear regression, multivariate adaptive regres-
              sion splines (MARS) with few more advanced ML techniques, such as SVM,
              RF, and gradient boosted machine (GBM) to predict the production quality
              and optimization of almost 500 unconventional wells in both the Permian
              Basin and Eagle Ford Shale, respectively. Predictor variables include a wide
              range of categorical and continuous operational and completion well data,
              such as surface location, architecture (operator, well azimuth, angle, length),
              stimulation details (fracture fluid, proppant amount, etc.), as well as geolog-
              ical data such as permeability, porosity, viscosity, and other metrics. The
              production metrics in both studies included a wide range of oil production,
              accumulated over various periods. To compare the predictive performance
              of different methods, Zhong et al. (2015) have adopted two objective met-
              rics: the average absolute error (AAE) and mean squared error (MSE). In
              addition, they have evaluated the tolerances of individual ML methods
              for missing values, as one of the most common issues in real-world data sets.
              In terms of overall quality of the predictive fit measured by AAE and MSE,
              the RF demonstrated the best performance, which is in line with earlier
              observations explained in Section 4.2.2.
                 Another frequently used DM technique for multivariate nonlinear pre-
              dictions is decision trees. For example, Maucec et al. (2015) have deployed
              classification and regression tree (CART) analysis to investigate whether,
   171   172   173   174   175   176   177   178   179   180   181