Page 42 - Artificial Intelligence for the Internet of Everything
P. 42

Uncertainty Quantification in Internet of Battlefield Things  29


              gradient descent (Nesterov, 2004). Currently, there has not been a lot of
              attention given to NAGD in the stochastic setting (though, see Yang
              et al., 2016). We would like to extend NAGD to several problem settings
              in a stochastic context. Additionally, we would like to incorporate adaptive
              sampling (using varying batch sizes) into a stochastic NAGD. Adaptive
              sampling has shown promise, yet has not received a lot of attention.
                 The development of a more-efficient SGD algorithm would reduce
              uncertainty in the training of ML models at a faster rate than current algo-
              rithms. This is especially important for online learning, where it is crucial
              that algorithms adapt efficiently to newly observed data in the field.


              2.3.6 Generalized Linear Models

                                                                   p
              The first andsimplest choice of estimator functionclass is F¼  .In this case,
                                                                   T
              the estimator is a generalized linear model (GLM): ^yðxÞ¼ w x for some
                                  p
              parameter vector w 2  (Nelder & Baker, 1972). In this case optimizing
              the statistical loss is the stochastic convex optimization problem, stated as:
                                                 T
                                                                          (2.3)
                                     min  x,y ½‘ðw x,yފ
                                     w2 p
              Observe that to optimize Eq. (2.3), assuming a closed-form solution is una-
              vailable, a gradient descent or Newton’s method must be used (Boyd & Van-
              derberghe, 2004). However, either method requires computing the gradient
                                 T
              of LðwÞ :¼  x,y ½‘ðw x,yފ, which requires infinitely many realizations
              (x n , y n ) of the random pair (x, y), and thus has infinite complexity. This
              computational bottleneck has been resolved through the development of
              stochastic approximation (SA) methods (Bottou, 1998 a; Robbins & Monro,
              1951), which operate on subsets of data examples per step. The most com-
              mon SA method is the SGD, which involves descending along the stochastic
                            T
              gradient r w ‘(w x t , y t ) rather than the true gradient at each step:
                                                    T
                                                         t
                                             t
                                                    t
                                 w t +1 ¼ w t  η r w ‘ðw x t ,y Þ         (2.4)
              Use of SGD (Eq. 2.4) is prevalent due to its simplicity, ease of use, and the
              fact that it converges to the minimizer of Eq. (2.3) almost certainly, and in
                                p ffiffi
              expectation at a Oð1= tÞ rate when L(w) is convex and a sublinearly Oð1=tÞ
              when it is strongly convex. Efforts to improve the mean convergence rate to
                   2
              Oð1=t Þ through the use of Nesterov acceleration (Nemirovski, Juditsky,
              Lan, & Shapiro, 2009) have also been developed, whose updates are given as:
                                                    T
                                                         t
                                                    t
                                              t
                                 w t +1 ¼ v t  η r w ‘ðv x t ,y Þ         (2.5)
                                 v t +1 ¼ð1 γ Þw t +1 + γ w t
                                              t
                                                       t
   37   38   39   40   41   42   43   44   45   46   47