Page 48 - Artificial Intelligence for the Internet of Everything
P. 48
Uncertainty Quantification in Internet of Battlefield Things 35
(2.9)
^ yðxÞ¼ g 1 ∘g 2 ∘⋯g K ðxÞ
and typically one tries to make the distance between the target variable and
the estimator small by minimizing their quadratic distance:
2
(2.10)
min x,y ðy g 1 ∘g 2 ∘⋯g K ðxÞÞ
w 1,…,w K
where each w k is a vector whose length depends on the number of
“neurons” at each layer of the network. This operation may be thought
of as an iterated generalization of a convolutional filter. Additional complex-
ities can be added at each layer, such as aggregating values output for the
activation functions by their maximum (max pooling) or average. But the
training procedure is similar: minimize a variant of the highly nonconvex,
high-dimensional stochastic program (Eq. 2.10). Due to their high dimen-
sionality, efforts to modify nonconvex stochastic optimization algorithms to
be amenable to parallel computing architectures have gained salience in
recent years. An active area of research is the interplay between parallel sto-
chastic algorithms and scientific computing to minimize the clock time
required for training neural networks—see Lian, Huang, Li, and Liu
(2015), Mokhtari, Koppel, Scutari, and Ribeiro (2017), and Scardapane
and Di Lorenzo (2017). Thus far, efforts have been restricted to attaining
computational speedup by parallelization to convergence at a stationary
point, although some preliminary efforts to escape saddle points and ensure
convergence to a local minimizer have also recently appeared (Lee, Simcho-
witz, Jordan, & Recht, 2016); these modify convex optimization tech-
niques, for instance, by replacing indefinite Hessians with positive
definite approximate Hessians (Paternain, Mokhtari, & Ribeiro, 2017).
2.4.3 Uncertainty Quantification in Deep Neural Network
In this section we discuss UQ in neural networks through Bayesian methods,
more specifically, posterior sampling. Hamiltonian Monte Carlo (HMC) is
the best current approach to perform posterior sampling in neural networks.
HMC is the foundation from which all other existing approaches are
derived. HMC is an MCMC method (Brooks, Gelman, Jones, & Meng,
2011) that has been a popular tool in the ML literature to sample from com-
plex probability distributions when random walk-based first-order Langevin
samplers do not exhibit the desired convergence behaviors. Standard HMC
approaches are designed to propose candidate samplers for a Metropolis-
Hastings-based acceptance scheme with high acceptance probabilities; since