Page 48 - Artificial Intelligence for the Internet of Everything
P. 48

Uncertainty Quantification in Internet of Battlefield Things  35


                                                                          (2.9)
                                    ^ yðxÞ¼ g 1 ∘g 2 ∘⋯g K ðxÞ
              and typically one tries to make the distance between the target variable and
              the estimator small by minimizing their quadratic distance:
                                                            2
                                                                         (2.10)
                                 min  x,y ðy g 1 ∘g 2 ∘⋯g K ðxÞÞ
                               w 1,…,w K
              where each w k is a vector whose length depends on the number of
              “neurons” at each layer of the network. This operation may be thought
              of as an iterated generalization of a convolutional filter. Additional complex-
              ities can be added at each layer, such as aggregating values output for the
              activation functions by their maximum (max pooling) or average. But the
              training procedure is similar: minimize a variant of the highly nonconvex,
              high-dimensional stochastic program (Eq. 2.10). Due to their high dimen-
              sionality, efforts to modify nonconvex stochastic optimization algorithms to
              be amenable to parallel computing architectures have gained salience in
              recent years. An active area of research is the interplay between parallel sto-
              chastic algorithms and scientific computing to minimize the clock time
              required for training neural networks—see Lian, Huang, Li, and Liu
              (2015), Mokhtari, Koppel, Scutari, and Ribeiro (2017), and Scardapane
              and Di Lorenzo (2017). Thus far, efforts have been restricted to attaining
              computational speedup by parallelization to convergence at a stationary
              point, although some preliminary efforts to escape saddle points and ensure
              convergence to a local minimizer have also recently appeared (Lee, Simcho-
              witz, Jordan, & Recht, 2016); these modify convex optimization tech-
              niques, for instance, by replacing indefinite Hessians with positive
              definite approximate Hessians (Paternain, Mokhtari, & Ribeiro, 2017).


              2.4.3 Uncertainty Quantification in Deep Neural Network

              In this section we discuss UQ in neural networks through Bayesian methods,
              more specifically, posterior sampling. Hamiltonian Monte Carlo (HMC) is
              the best current approach to perform posterior sampling in neural networks.
              HMC is the foundation from which all other existing approaches are
              derived. HMC is an MCMC method (Brooks, Gelman, Jones, & Meng,
              2011) that has been a popular tool in the ML literature to sample from com-
              plex probability distributions when random walk-based first-order Langevin
              samplers do not exhibit the desired convergence behaviors. Standard HMC
              approaches are designed to propose candidate samplers for a Metropolis-
              Hastings-based acceptance scheme with high acceptance probabilities; since
   43   44   45   46   47   48   49   50   51   52   53