Page 169 -
P. 169

5.2 Activation Functions   157

                            of  the function that is of  interest is represented. The important aspect is that there
                            are now two global minima,  ((1.88, 0.46), (-1.88, -0.46)), and two local minima,
                             ((0.15, -1,125), (-0.15, 1.125)).
                               Corresponding to the global minima we have the parabola represented by a solid
                             line in Figure 5.9a, which fits the data quite well, and has an energy minimum of
                             0.0466. Corresponding to the local minima is a parabola represented by  a dotted
                             line in Figure 5.9a, far off the target points, and with an energy minimum of 2.547.
                             As  the  normal  equations  would  be  laborious  to  solve,  in  this  case  a  gradient
                             descent method  would  be  preferred. The problem  is that if  we  start our gradient
                             descent at the point (0,-2), for instance, we  would end up  in the local minimum,
                             with  quite  a  bad  solution.  This  simple  example  illustrates,  therefore,  how
                             drastically  different  a  local  minimum  solution  can  be  from  a  global  minimum
                             solution, and the need to perform several trials with different starting points when
                             solving an LMS discriminant adjustment using the gradient descent method.
                               Usually one has no previous knowledge of  what kind of  activation function is
                             most  suitable to  the  data.  This  is,  in  fact,  a  similar issue to  selecting the  most
                             suitable transformation function for the input features. There are three popularised
                             activation functions  that  have  been  extensively  studied  and  employed  in  many
                             software products implementing neural nets. These are:


                               The step function:


                                                                      1
                               The logistic sigmoid function:   sig(x) = -                (5- lob)
                                                                   1 + e-*


                                                                    ern - e-rn
                               The hyperbolic tangent function:  tanh(x) =                (5-1Oc)
                                                                    ear + e-rn














                              Figure  5.10.  Common  activation functions:  (a)  Step; (b) Logistic  sigmoid with
                              a=l; (c) Tanh sigmoid with a=l.



                                These three functions are represented in Figure 5.10. There are variants of these
                              activation functions, depending on the output ranges and scaling factors, without
   164   165   166   167   168   169   170   171   172   173   174