Page 169 -
P. 169
5.2 Activation Functions 157
of the function that is of interest is represented. The important aspect is that there
are now two global minima, ((1.88, 0.46), (-1.88, -0.46)), and two local minima,
((0.15, -1,125), (-0.15, 1.125)).
Corresponding to the global minima we have the parabola represented by a solid
line in Figure 5.9a, which fits the data quite well, and has an energy minimum of
0.0466. Corresponding to the local minima is a parabola represented by a dotted
line in Figure 5.9a, far off the target points, and with an energy minimum of 2.547.
As the normal equations would be laborious to solve, in this case a gradient
descent method would be preferred. The problem is that if we start our gradient
descent at the point (0,-2), for instance, we would end up in the local minimum,
with quite a bad solution. This simple example illustrates, therefore, how
drastically different a local minimum solution can be from a global minimum
solution, and the need to perform several trials with different starting points when
solving an LMS discriminant adjustment using the gradient descent method.
Usually one has no previous knowledge of what kind of activation function is
most suitable to the data. This is, in fact, a similar issue to selecting the most
suitable transformation function for the input features. There are three popularised
activation functions that have been extensively studied and employed in many
software products implementing neural nets. These are:
The step function:
1
The logistic sigmoid function: sig(x) = - (5- lob)
1 + e-*
ern - e-rn
The hyperbolic tangent function: tanh(x) = (5-1Oc)
ear + e-rn
Figure 5.10. Common activation functions: (a) Step; (b) Logistic sigmoid with
a=l; (c) Tanh sigmoid with a=l.
These three functions are represented in Figure 5.10. There are variants of these
activation functions, depending on the output ranges and scaling factors, without