Page 45 - Artificial Intelligence for the Internet of Everything
P. 45
32 Artificial Intelligence for the Internet of Everything
2.4 UNCERTAINTY QUANTIFICATION IN MACHINE
LEARNING
Most standard ML algorithms only give a single output: the predicted value
^ y. While this is a primary goal of ML, as discussed in Section 2.1, it is impor-
tant in many scenarios to also have a measure of model confidence. In par-
ticular, we would like the model to take into account variability due to new
observations being “far” from the data on which the model was trained. This
is particularly interesting for tactical application in which the human deci-
sion makers rely on the confidence of the predictive model to make action-
able decisions. Unfortunately, this area has not been widely developed. We
explore two ways of UQ in the context of ML: explicit and implicit uncer-
tainty measures.
By explicit measures we mean methods that, in addition to a model’s
y
prediction ^, perform a separate computation to determine the model’s
confidence in that particular point. These methods often measure some
kind of distance between a new data point and the training data. New data
that are far away from the training data give reason to proceed with cau-
tion. A naive way to measure confidence explicitly would be to output an
indicator variable that tells whether the new data points fall within the con-
vex hull of the training data. If a point does not fall within the convex hull,
the user would have reason to be suspicious of that prediction. More com-
plicated methods can be applied using modern outlier detection theory
(model-based methods, proximity-based methods, etc.). In particular,
thesemethods cangivemore indicative measures of confidence, as
opposed to a simple 1 or 0, and are more robust to outliers within the
training data.
Another approach to UQ is incorporating the uncertainty arising from
new data points in the ML model implicitly. A natural way of doing this is
using a Bayesian approach: we can use a Gaussian process (GP) to model
our beliefs about the function that we wish to learn. Predictions have a large
variance in regions where little data has been observed, and smaller variance
in regions where the observed data are more dense. However, for large,
high-dimensional datasets, GPs become difficult to fit. Current methods that
approximate GPs in some way that show promise include variational infer-
ence methods, dropout neural networks (Das, Roy, & Sambasivan, 2017),
and neural network ensembles (Lakshminarayanan, Pritzel, &
Blundell, 2017).