Page 167 -
P. 167

5.2 Activation Functions   155





                                   5.2  Activation Functions


                                   We  have  mentioned  previously  that,  in  order  to  obtain  more  complex
                                   discriminants, some type of  non-linear function will have to be  used as shown in
                                   Figure  5.7.  The  non-linear  function  f is  called  an  activation  function.  The
                                   generalized decision function can now be written as:

                                      d(x) = f  (w' x) .                                          (5-8)


















                                    Figure  5.7.  Connectionist  structure  of  a  generalized  decision  function.  The
                                    processing unit (rectangle with dotted line) implementsAX).



                                      Let  us  consider  the one-dimensional two-class example shown in  Figure 5.8a,
                                    with three points -1,  0 and  1 and respective target values  1, - 1, 1. We now have a
                                    class  (target  value  1) with  disconnected  regions  (points  -1  and  1). We  try  to
                                    discriminate between the two classes by  using  the following parabolic  activation
                                    function:





                                      In  order to  study the energy  function  in a simple way,  we assume,  as  in  the
                                    previous example of  Figure 5.2,  that the parabolic activation function  is  directly
                                    applied to the linear discriminant, as shown in equation (5-9a). From now on we
                                    represent by  an open circle the output neuron of the discriminant unit, containing
                                    both the summation and the activation function, as in Figure 5.8b.
                                      The energy function is symmetric and is partly represented in Figure 5.8~. The
                                    rugged aspect of the surface is due solely to the discrete step of 0.5 used for the
                                    weights  a  and  6.  There  are  two  global  minima  at  (62,  0)  and  (-62,  O),  both
                                    corresponding to the obvious solution d(x) = 2x2-1,  passing exactly by the target
   162   163   164   165   166   167   168   169   170   171   172