Page 176 - Intelligent Digital Oil And Gas Fields
P. 176
Components of Artificial Intelligence and Data Analytics 135
4.3.2 Data Mining, Multivariate, Root-Cause,
and Performance Analysis
As a statistical discipline, DM has been used for more than half a century, and
has become more applicable and widely used with the emergence of Big
Data. Interestingly, one of the fastest tracks in DM development is now seen
in the areas of social networking, recommendation systems, and online com-
merce, which are permanently exposed to enormous volumes of generated
data subject to analysis, interpretation, and decision-making (Leskovec et al.,
2014; Hallac et al., 2015). In E&P, it seems like more systematic use of DM
techniques correlates with the diminishing availability of conventional
hydrocarbon resources and the rise of unconventional reservoirs (e.g., shale
plays) as the main source of oil and gas.
Numerous publications on the use of DM in the oil and gas industry have
emerged in recent years. Moreover, Bravo et al. (2014) have reported that
DM ranks among the highest Web-searched term within AIPA technologies.
This section briefly summarizes a few recent applications that apply nonlinear
multivariate prediction, classification, and root-cause analysis.
Zhong et al. (2015) and Gao and Gao (2013) have cross-evaluated and
compared standard univariate linear regression, multivariate adaptive regres-
sion splines (MARS) with few more advanced ML techniques, such as SVM,
RF, and gradient boosted machine (GBM) to predict the production quality
and optimization of almost 500 unconventional wells in both the Permian
Basin and Eagle Ford Shale, respectively. Predictor variables include a wide
range of categorical and continuous operational and completion well data,
such as surface location, architecture (operator, well azimuth, angle, length),
stimulation details (fracture fluid, proppant amount, etc.), as well as geolog-
ical data such as permeability, porosity, viscosity, and other metrics. The
production metrics in both studies included a wide range of oil production,
accumulated over various periods. To compare the predictive performance
of different methods, Zhong et al. (2015) have adopted two objective met-
rics: the average absolute error (AAE) and mean squared error (MSE). In
addition, they have evaluated the tolerances of individual ML methods
for missing values, as one of the most common issues in real-world data sets.
In terms of overall quality of the predictive fit measured by AAE and MSE,
the RF demonstrated the best performance, which is in line with earlier
observations explained in Section 4.2.2.
Another frequently used DM technique for multivariate nonlinear pre-
dictions is decision trees. For example, Maucec et al. (2015) have deployed
classification and regression tree (CART) analysis to investigate whether,