Page 49 - Artificial Intelligence for the Internet of Everything
P. 49

36    Artificial Intelligence for the Internet of Everything


          calculation of these M-H ratios necessitates a pass through the entire dataset,
          scalability of HMC-based algorithms has been limited. This has been
          addressed recently with the development of stochastic approaches, inspired
          by the now ubiquitous SGD-based ERM algorithms, where we omit the
          M-H correction step and calculate the Hamiltonian gradients over random
          mini-batches of the training data (Chen, Fox, & Guestrin, 2014; Welling &
          Teh, 2011). Further improvements to these approaches have been done by
          incorporating Riemann manifold techniques to learn the critically important
          Hamiltonian mass matrices, both in the standard HMC (Girolami & Calder-
          head, 2011) and stochastic (Ma, Chen, & Fox, 2015; Roychowdhury,
          Kulis, & Parthasarathy, 2016) settings. These Riemannian approaches have
          been shown to noticeably improve the acceptance probabilities of samples
          following the methods of those proposed by Girolami and Calderhead
          (2011), and dramatically improve the convergence rates in the stochastic
          formulations as well (Roychowdhury et al., 2016).
             Preliminary experiments show that HMC does not work well in prac-
          tice. Thus one can identify two challenges with posterior sampling using
          HMC. First, HMC still has a hard time finding the different modes of the
          distribution (i.e., if it can escape metastable regions of the HMC Markov
          chain). Second, as stated earlier, the massive dimensionality of deep neural
          networks make the most of the posterior probability mass that resides in
          models with poor classification accuracy. Fig. 2.2 shows sampled neural net-
          work models as a function of HMC steps for CIFAR100 image classification
          task (using a LeNet CNN architecture). In as few as 100 HMC steps, the
          posterior-sampled models are significantly worse than the best models in
          both training and validation accuracies. Thus HMC posterior sampling is



















          Fig. 2.2 HMC samples produce inaccurate neural network models after a few HMC steps
          in the CIFAR100 image classification task.
   44   45   46   47   48   49   50   51   52   53   54