Page 40 - Artificial Intelligence for the Internet of Everything
P. 40
Uncertainty Quantification in Internet of Battlefield Things 27
around randomly more than we would like. A natural idea is to use a larger
sample size for the gradient estimates. The basic SGD uses a sample size of
n ¼ 1, so it makes sense that we might wish to use more draws from the
oracle to reduce the variance. Specifically, we would reduce the variance
1
in gradient estimates by a factor of , where n is the batch size. The mini-batch
n
algorithm is implemented as shown in Algorithm 2.3.
It is shown in Bottou, Curtis, and Nocedal (2016) that under assump-
tions of smoothness and strong convexity, standard SGD (sample size
n ¼ 1 with fixed α) achieves an error rate of:
αLM k 1 αLM
2cμ 2cμ
ð f ðx k Þ f ðx*ÞÞ + ð1 αcμÞ f ðx 1 Þ f ðx*Þ
This convergence rate has several assumption-dependent constants such as
the Lipschitz constant L and strong-convexity parameter c. However, the
defining feature of that algorithm is the ( ) k 1 term, which implies that
the error decreases exponentially to the constant αLM . Using mini-batch
2cμ
SGD, on the other hand, yields the following bound:
1 αLM k 1 1 αLM
ð f ðx k Þ f ðx*ÞÞ + ð1 αcμÞ f ðx 1 Þ f ðx*Þ
n 2cμ n 2cμ
Thus the mini-batch algorithm converges at the same rate, but to a smaller
error constant. This is certainly an improvement in terms of the error bound,
Algorithm 2.3 Mini-Batch SGD