Page 155 - Intelligent Digital Oil And Gas Fields
P. 155
118 Intelligent Digital Oil and Gas Fields
James et al. (2014), and Jump-start Machine Learning in R: Apply Machine
Learning with R Now by Brownlee (2014). The latter two references focus
specifically on applications of the ML algorithms and techniques in pro-
gramming language R, which has become a de facto standard for statistical
computing (The R Foundation, 2017).
ML methods can be classified into three main paradigms: supervised
learning, unsupervised learning, and reinforcement learning (RL) (Jordan
and Mitchell, 2015), which are summarized below and in Table 4.4.
• Supervised learning. Let us assume an ML system with a set of input param-
eters (x i ; i¼1,…,n), called predictors and associated output/measured var-
iables called the responses (y i ). The supervised learning system generally
yields its prediction via a learned mapping function f(x), which produces
an output y i for each x i or a probability distribution p(yjx). The objective
is to design and fit a model that finds a relation between the response and
predictors, with the objective of accurately predicting the response for
future observations (predictions or forecasts). In supervised learning,
response variables are usually characterized as quantitative (also referred
to as continuous) or qualitative (also known as categorical). It is a common
practice in data science to refer to problems with a quantitative response
as regression problems and to those with a categorical response as classification
problems. A variant of regression and classification methods are the
so-called tree-based methods. These methods work on the principle of
segmenting the predictor space into a number of simple regions. To make
a prediction for a given observation, the tree-based methods usually use
statistical moments such as mean or variance. As the set of splitting rules
tosegmentthepredictorspacecanbeconvenientlyrepresentedasatree,it
visually makes the decision process significantly easier; these types of
approaches are also referred to as decision trees methods.
• Unsupervised learning addresses more challenging situations, where for
every observation i¼1,…,n, one finds a vector of measurements x i
but no associated response y i . Hence, it is not possible to fit a linear
regression model because there is no response variable to predict. In such
conditions, finding a solution is less transparent and the approach is
referred to as unsupervised. The two main classes of unsupervised learning
methods are the cluster analysis or clustering and the so-called dimensionality
reduction methods. The objective of cluster analysis is to determine, on
the basis of variables or parameters x 1 , …, x n , whether these observations
can be classified into relatively distinct groups called clusters and if there is
a possibility to represent individual clusters with their single