Page 165 - Intelligent Digital Oil And Gas Fields
P. 165

Components of Artificial Intelligence and Data Analytics     127


              Various techniques have been introduced by statisticians to improve upon
              statistical learning capabilities of decision trees, like bootstrap aggregation
              or bagging (James et al., 2014); however, while bagging dramatically
              improves the prediction accuracy of decision trees, it comes at the expense
              of interpretability.
                 The RF approach provides an improvement over the bagged trees by
              de-correlating the trees, which reduces the variance when the trees are
              averaged. When building decision trees (they are generated in parallel), each
              time a split in the tree is considered and a random selection of m predictors is
              chosen as a subset of split candidates from the full set of predictors. Hence, as
              the new selection of m predictors is generated at each split, and one typically
                         p
              chooses m   p, which means that the number of predictors considered at
                          ffiffiffi
              each split (m) is approximately equal to the square root of the total number of
              predictors, p.
                 The predictor variables for RF method can be of any type: numerical,
              categorical, continuous, or discrete. The method automatically includes
              interaction among the predictor variables in the model because of the hier-
              archical structure of trees. The fact that the RF trees are insensitive to
              skewed distributions (i.e., do not require mapping into normal score
              domains), outliers, and missing values (i.e., data imputation methods are less
              required), they are considered as one of the most efficient “of-the-shelf”
              predictive ML techniques.


              4.2.3 Visualization and Interactivity
              This section presents a summary and examples of selected diagrams, graphs,
              and images for qualitative and quantitative visualization of primarily predic-
              tive analytics, pertaining to statistical learning and ML and multivariate anal-
              ysis (Fig. 4.10).
                 We did not have room to include examples for visualization of descrip-
              tive analytics (e.g., EDA with uni- and bivariate statistics, such as histograms,
              statistical distributions, box plots, QQ-plots, cross-plots, and correlation/
              covariance). For more information on these, see Gelman and Hill (2007)
              and Seltman (2015).
                 Fig. 4.11 shows example of interactive analytical dashboard visualization
              for DOF applications. The figure adds the Tornado chart (ranking impor-
              tance of predictor variable in terms of response variable) and the receiver
              operating characteristics (ROC) (sensitivity of the binary classifier to false
              alarm probability) to the list of selected visualization options.
   160   161   162   163   164   165   166   167   168   169   170