Page 164 - Intelligent Digital Oil And Gas Fields
P. 164

126                                       Intelligent Digital Oil and Gas Fields


          we are solving a classification problem where the observation data fall into
          two classes, colored in pink and blue in Fig. 4.9A. The support vector clas-
          sifier seeks the linear boundary between the two classes of observed data
          points and thus performs quite poorly (Fig. 4.9B).
             When applying the SVM method, which is an extension of support
          vector classifier, the classification feature space is rearranged in a specific
          way using nonlinear functions, that is, kernels. If then, an SVM with a poly-
          nomial kernel of the third degree is applied to the nonlinear distribution of
          data points as shown in Fig. 4.9A; the result is the significantly better fitting
          classification presented in Fig. 4.9C, which renders better decisions.
          Furthermore, if instead, the SVM is applied with the radial basis kernel,
          the classification/decision boundary is captured even more accurately
          (Fig. 4.9D).
             When applied to statistical regression problems, the SVM method is
          referred to as support vector regression (SVR). Both techniques are closely
          related and only applied to a different class of problems. In the case of SVR,
          the regression function usually has the form (Zhong et al., 2015)

                                     N
                                    X    ∗        t     p
                          f xðÞ ¼ y ¼  ð α  α i Þ v x +1 + b           (4.4)
                                         i
                                                 i
                                    i¼1
                                                          ∗
          where v 1 , …, v N are N support vectors and b, p, α i , and α i are the parameters
          of the model, which are optimized with respect to ε-insensitive loss (Zhong
          et al., 2015). During the parameter estimation, the N support vectors are
          selected from the data training set. Similar to the nature of classification
          problems, solve the nonlinear regression with the application of kernel func-
          tion. For information, the radial basis kernel function (as mentioned previ-
          ously for classification) acquires the form:

                                                     2
                                               j
                              Kv i , xð  Þ ¼ exp  γ v i  xj            (4.5)


          4.2.2.3 Random Forest
          Random forest (RF) is an ensemble ML method that constructs a large num-
          ber of uncorrelated decision trees based on averaging random selection of
          predictor variables. For in-depth introduction into the concept of decision
          trees, see James et al. (2014). In their fundamental formulation, decision trees
          have proven to be very successful in solving classification problems of statis-
          tical learning; however, they are less efficient for nonlinear regression.
   159   160   161   162   163   164   165   166   167   168   169