Page 154 - Intelligent Digital Oil And Gas Fields
P. 154
Components of Artificial Intelligence and Data Analytics 117
Although the data mining is rapidly gaining rightful popularity—particularly
in conjunction with Big Data analytics, where DOF applications are cer-
tainly not an exception—a caveat related to the risk that a data mining analyst
may discover patterns that are meaningless, because they are not supported
by the data exists. Consequently, this effect, the statisticians call Bonferroni’s
Principle (Leskovec et al., 2014), may, for example, generate statistical arti-
facts rather than evidence of the conducted search and lead to unrealistic pre-
dictive models. The solution comes in the form of the Bonferroni
correction, when several dependent or independent statistical tests are being
performed simultaneously on a single data set.
4.2.2 Statistical and Machine Learning
Although the terms statistical learning and ML differ by name, they are quite
similar, and, in fact, both types of learning are inseparably intertwined. Sta-
tistical learning refers to the set of tools for modeling and understanding com-
plex and large-scale data sets, such as Big Data. It is a fairly recently developed
area of statistics and largely complements the developments in computer sci-
ences (e.g., advanced data management and cloud computing) and ML. ML
addressesthequestionof“howtobuildcomputersthatimproveautomatically
through experience” (Jordan and Mitchell, 2015). This section gives a brief
overview of the core ML methods and outlines some trends and prospects for
future developments. It summarizes the most popular ML techniques, high-
lightsits threemainparadigms, andprovidescharacteristicexamples.AsMLis
becoming increasingly popular in the E&P industry, a few successful applica-
tions relevant to the DOF are presented in Section 4.3.
Conceptually, ML algorithms can be viewed as navigating through a
large domain of candidate programs to identify a program that optimizes
a specified performance metric or objective. The application of ML algo-
rithms varies greatly depending on the nature of the problem, for example,
through use of decision trees, mathematical functions, optimization, etc.
However, with the vast amount of Big Data, it is imperative that the
common denominator of ML techniques appropriate for DOF applica-
tions become highly scalable solutions which support the platforms of
the cloud and HPC, real-time analytics, and the rapidly expanding IoT,
all with robust and resilient cybersecurity mechanisms (see Chapter 2,
Instrumentation and Measurement). For more information see The Elements
of Statistical Learning: Data Mining, Inference and Prediction by Hastie et al.
(2011), An Introduction to Statistical learning: with applications in R by