Page 36 - Artificial Intelligence for the Internet of Everything
P. 36

Uncertainty Quantification in Internet of Battlefield Things  23


                 Of course the optimization literature provides numerous algorithms
              including gradient descent, Newton’s method, Quasi-Newton methods,
              and nonlinear CG. In many applications these methods have been widely
              used for some time with much success. However, the size of contemporary
              datasets as well as complex model structure make traditional optimization
              algorithms ill suited for contemporary ML problems. For example, consider
              the standard gradient descent algorithm. When the objective function is
              strongly convex, the training uncertainty (or training error), f(x k )   f(x*),
                               n k
              is proportional to e  , where n is the amount of training data and k is
              the number of iterations of the algorithm. Thus the training uncertainty
              grows exponentially with the size of the data. For large datasets, gradient
              descent clearly becomes impractical.
                 Stochastic gradient descent (SGD), on the other hand, has training
              uncertainty proportional to 1/k. Although this error decreases more
              slowly in terms of k,it is independent of thesamplesize, giving it aclear
              advantage in a modern settings. Moreover, in a stochastic optimization
              setting, SGD actually achieves the optimal complexity rate in terms of
              k. These two features, as well as SGD’s simplicity, have allowed it to
              become the standard optimization algorithm for most large-scale ML
              problems, such as training neural nets. However, SGD is not without
              its drawbacks. In particular, it performs poorly for badly scaled problems
              (when the Hessian is ill conditioned) and can have high variance with its
              step directions. While SGD’s performance starts off particularly strong, its
              convergence to the optimum quickly slows. Although no stochastic algo-
              rithm can do better than SGD in terms of error uncertainty (in terms of
              k), there is certainly potential to improve the convergence rate up to a
              constant factor.
                 With regard to uncertainty, we would like to achieve a sufficiently
              small error for our model as quickly as possible. In the next section we
              discuss the formal stochastic optimization problem, the standard SGD
              algorithm, along with two popular variants of SGD. In Section 2.3 we
              propose an avenue for the development of an SGD variant that seems
              especially promising and discuss avenues of research for UQ in model
              predictions.


              2.3.1 Optimization Problem
              The standard stochastic optimization problem is:

                                                                          (2.1)
                                    min f ðwÞ¼ ½Fðw;ξފ
                                      w
   31   32   33   34   35   36   37   38   39   40   41