Page 165 - Intelligent Digital Oil And Gas Fields
P. 165
Components of Artificial Intelligence and Data Analytics 127
Various techniques have been introduced by statisticians to improve upon
statistical learning capabilities of decision trees, like bootstrap aggregation
or bagging (James et al., 2014); however, while bagging dramatically
improves the prediction accuracy of decision trees, it comes at the expense
of interpretability.
The RF approach provides an improvement over the bagged trees by
de-correlating the trees, which reduces the variance when the trees are
averaged. When building decision trees (they are generated in parallel), each
time a split in the tree is considered and a random selection of m predictors is
chosen as a subset of split candidates from the full set of predictors. Hence, as
the new selection of m predictors is generated at each split, and one typically
p
chooses m p, which means that the number of predictors considered at
ffiffiffi
each split (m) is approximately equal to the square root of the total number of
predictors, p.
The predictor variables for RF method can be of any type: numerical,
categorical, continuous, or discrete. The method automatically includes
interaction among the predictor variables in the model because of the hier-
archical structure of trees. The fact that the RF trees are insensitive to
skewed distributions (i.e., do not require mapping into normal score
domains), outliers, and missing values (i.e., data imputation methods are less
required), they are considered as one of the most efficient “of-the-shelf”
predictive ML techniques.
4.2.3 Visualization and Interactivity
This section presents a summary and examples of selected diagrams, graphs,
and images for qualitative and quantitative visualization of primarily predic-
tive analytics, pertaining to statistical learning and ML and multivariate anal-
ysis (Fig. 4.10).
We did not have room to include examples for visualization of descrip-
tive analytics (e.g., EDA with uni- and bivariate statistics, such as histograms,
statistical distributions, box plots, QQ-plots, cross-plots, and correlation/
covariance). For more information on these, see Gelman and Hill (2007)
and Seltman (2015).
Fig. 4.11 shows example of interactive analytical dashboard visualization
for DOF applications. The figure adds the Tornado chart (ranking impor-
tance of predictor variable in terms of response variable) and the receiver
operating characteristics (ROC) (sensitivity of the binary classifier to false
alarm probability) to the list of selected visualization options.