Page 39 - Artificial Intelligence for the Internet of Everything

P. 39

26 Artificial Intelligence for the Internet of Everything

Algorithm 2.2 Basic SGD for Logistic Regression

direction to be a random gradient evaluated at ξ k ¼ (X k , Y k ), which is sam-
pled from the oracle:

k k
gðx k ,ξ Þ¼ rFðw k ,ξ Þ
¼r‘ðw k ;x k ,y k Þ
(2.2)
!
T
e w x k
¼ x k y k T
1+ e w x k
The implemented algorithm is then (Algorithm 2.2):
Again, note that for each iteration only a single training point is evaluated.
On the other hand the full-gradient method would have to use the entire
dataset for every iteration.

2.3.4 SGD Variants

As mentioned in Section 2.1, the basic SGD algorithm has some room for
improvement. In this section we introduce two popular SGD variants: mini-
batch SGD and SGD with momentum. Each variant gives a different yet
useful way of improving upon the basic SGD algorithm.

2.3.4.1 Mini-Batch SGD
One of the major issues with SGD is that its search directions have high var-
iance. Instead of moving downhill as intended, the algorithm may wander

34 35 36 37 38 39 40 41 42 43 44