Page 155 - Intelligent Digital Oil And Gas Fields
P. 155

118                                       Intelligent Digital Oil and Gas Fields


          James et al. (2014), and Jump-start Machine Learning in R: Apply Machine
          Learning with R Now by Brownlee (2014). The latter two references focus
          specifically on applications of the ML algorithms and techniques in pro-
          gramming language R, which has become a de facto standard for statistical
          computing (The R Foundation, 2017).
             ML methods can be classified into three main paradigms: supervised
          learning, unsupervised learning, and reinforcement learning (RL) (Jordan
          and Mitchell, 2015), which are summarized below and in Table 4.4.
          •  Supervised learning. Let us assume an ML system with a set of input param-
             eters (x i ; i¼1,…,n), called predictors and associated output/measured var-
             iables called the responses (y i ). The supervised learning system generally
             yields its prediction via a learned mapping function f(x), which produces
             an output y i for each x i or a probability distribution p(yjx). The objective
             is to design and fit a model that finds a relation between the response and
             predictors, with the objective of accurately predicting the response for
             future observations (predictions or forecasts). In supervised learning,
             response variables are usually characterized as quantitative (also referred
             to as continuous) or qualitative (also known as categorical). It is a common
             practice in data science to refer to problems with a quantitative response
             as regression problems and to those with a categorical response as classification
             problems. A variant of regression and classification methods are the
             so-called tree-based methods. These methods work on the principle of
             segmenting the predictor space into a number of simple regions. To make
             a prediction for a given observation, the tree-based methods usually use
             statistical moments such as mean or variance. As the set of splitting rules
             tosegmentthepredictorspacecanbeconvenientlyrepresentedasatree,it
             visually makes the decision process significantly easier; these types of
             approaches are also referred to as decision trees methods.
          •  Unsupervised learning addresses more challenging situations, where for
             every observation i¼1,…,n, one finds a vector of measurements x i
             but no associated response y i . Hence, it is not possible to fit a linear
             regression model because there is no response variable to predict. In such
             conditions, finding a solution is less transparent and the approach is
             referred to as unsupervised. The two main classes of unsupervised learning
             methods are the cluster analysis or clustering and the so-called dimensionality
             reduction methods. The objective of cluster analysis is to determine, on
             the basis of variables or parameters x 1 , …, x n , whether these observations
             can be classified into relatively distinct groups called clusters and if there is
             a possibility to represent individual clusters with their single
   150   151   152   153   154   155   156   157   158   159   160