Page 36 - Artificial Intelligence for the Internet of Everything

P. 36

Uncertainty Quantification in Internet of Battlefield Things 23

Of course the optimization literature provides numerous algorithms
including gradient descent, Newton’s method, Quasi-Newton methods,
and nonlinear CG. In many applications these methods have been widely
used for some time with much success. However, the size of contemporary
datasets as well as complex model structure make traditional optimization
algorithms ill suited for contemporary ML problems. For example, consider
the standard gradient descent algorithm. When the objective function is
strongly convex, the training uncertainty (or training error), f(x k ) f(x*),
n k
is proportional to e , where n is the amount of training data and k is
the number of iterations of the algorithm. Thus the training uncertainty
grows exponentially with the size of the data. For large datasets, gradient
descent clearly becomes impractical.
Stochastic gradient descent (SGD), on the other hand, has training
uncertainty proportional to 1/k. Although this error decreases more
slowly in terms of k,it is independent of thesamplesize, giving it aclear
advantage in a modern settings. Moreover, in a stochastic optimization
setting, SGD actually achieves the optimal complexity rate in terms of
k. These two features, as well as SGD’s simplicity, have allowed it to
become the standard optimization algorithm for most large-scale ML
problems, such as training neural nets. However, SGD is not without
its drawbacks. In particular, it performs poorly for badly scaled problems
(when the Hessian is ill conditioned) and can have high variance with its
step directions. While SGD’s performance starts off particularly strong, its
convergence to the optimum quickly slows. Although no stochastic algo-
rithm can do better than SGD in terms of error uncertainty (in terms of
k), there is certainly potential to improve the convergence rate up to a
constant factor.
With regard to uncertainty, we would like to achieve a sufficiently
small error for our model as quickly as possible. In the next section we
discuss the formal stochastic optimization problem, the standard SGD
algorithm, along with two popular variants of SGD. In Section 2.3 we
propose an avenue for the development of an SGD variant that seems
especially promising and discuss avenues of research for UQ in model
predictions.

2.3.1 Optimization Problem
The standard stochastic optimization problem is:

(2.1)
min f ðwÞ¼ ½Fðw;ξÞ
w

31 32 33 34 35 36 37 38 39 40 41