Page 40 - Artificial Intelligence for the Internet of Everything
P. 40

Uncertainty Quantification in Internet of Battlefield Things  27


              around randomly more than we would like. A natural idea is to use a larger
              sample size for the gradient estimates. The basic SGD uses a sample size of
              n ¼ 1, so it makes sense that we might wish to use more draws from the
              oracle to reduce the variance. Specifically, we would reduce the variance
                                           1
              in gradient estimates by a factor of , where n is the batch size. The mini-batch
                                           n
              algorithm is implemented as shown in Algorithm 2.3.
                 It is shown in Bottou, Curtis, and Nocedal (2016) that under assump-
              tions of smoothness and strong convexity, standard SGD (sample size
              n ¼ 1 with fixed α) achieves an error rate of:

                                   αLM            k 1               αLM
                                    2cμ                              2cμ
                 ð f ðx k Þ f ðx*ÞÞ    + ð1 αcμÞ     f ðx 1 Þ f ðx*Þ
              This convergence rate has several assumption-dependent constants such as
              the Lipschitz constant L and strong-convexity parameter c. However, the
              defining feature of that algorithm is the ( ) k 1  term, which implies that
              the error decreases exponentially to the constant  αLM . Using mini-batch
                                                          2cμ
              SGD, on the other hand, yields the following bound:

                                 1 αLM            k 1                 1 αLM
              ð f ðx k Þ f ðx*ÞÞ        + ð1 αcμÞ     f ðx 1 Þ f ðx*Þ
                                 n  2cμ                               n  2cμ

              Thus the mini-batch algorithm converges at the same rate, but to a smaller
              error constant. This is certainly an improvement in terms of the error bound,





                 Algorithm 2.3 Mini-Batch SGD
   35   36   37   38   39   40   41   42   43   44   45