Page 50 - Artificial Intelligence for the Internet of Everything
P. 50
Uncertainty Quantification in Internet of Battlefield Things 37
impractical for deep neural networks, even as the HMC acceptance proba-
bility is high throughout the experiment.
It is an open avenue of research to explore a few mode exploration fixes.
Here, traditional MCMC methods can be used, such as annealing and
annealing importance sampling. Less traditional methods are also explored,
such as stochastic initialization and model perturbation.
Regarding the important challenge of posterior sampling accurate
models in an given posterior mode, mini-batch stochastic gradient Langevin
dynamics (SGLD) (Welling & Teh, 2011) is increasingly credited to being a
practical Bayesian method to train neural networks to find good generaliza-
tion regions (Chaudhari et al., 2017), and it may help in improving parallel
SGD algorithms (Chaudhari et al., 2017). The connection between SGLD
and SGD has been explored in Mandt, Hoffman, and Blei (2017) for pos-
terior sampling in small regions around a locally optimal solution. To make
this procedure a legitimate posterior sampling approach, we explore the use
of Chaudhari et al.’s (2017) methods to smooth out local minima and sig-
nificantly extend the reach of Mandt et al.’s (2017) posterior sampling
approach.
This smoothing out has connections to Riemannian curvature methods
to explore the energy function in the parameter (weight) space (Poole,
Lahiri, Raghu, Sohl-Dickstein, & Ganguli, 2016). The Hessian, which is
a diffusion curvature, is used by Fawzi, Moosavi-Dezfooli, Frossard, and
Soatto (2017) as a measure of curvature to empirically explore the energy
function of learned model with regard to examples (the curvature with
regard to the input space, rather than the parameter space). This approach
is also related to the implicit regularization arguments of Neyshabur,
Tomioka, Salakhutdinov, and Srebro (2017).
There is a need to develop an alternative SGD-type method for accurate
posterior sampling of deep neural network models that is capable of giving
the all-important UQ in the decision-making problem in C3I systems. Not
surprisingly a system that correctly quantifies the probability that a suggested
decision is incorrect inspires more confidence than a system that incorrectly
believes itself to always be correct; the latter is a common ailment in deep
neural networks. Moreover, a general practical Bayesian neural network
method would help provide robustness against adversarial attacks (as the
attacker needs to attack a family of models, rather than a single model),
reduce generalization error via posterior-sampled ensembles, and provide
better quantification of classification accuracy and root mean square error
(RMSE).