Page 959 - The Mechatronics Handbook
P. 959

0066_Frame_C32.fm  Page 9  Wednesday, January 9, 2002  7:54 PM









                       is defined as

                                                             P
                                                     Error j ∑ ( o jp –  d jp )  2              (32.20)
                                                          =
                                                             p=1

                       then the derivative of the error with respect to the weight w ij  is

                                                                      (
                                                          P         df  net jp )
                                               ------------------ =  2 ∑ ( o jp –  d jp )----------------------x i  (32.21)
                                               d Error j
                                                 dw ij               d net jp
                                                          p=1
                       since o = f (net) and the net is given by Eq. (32.2). Note that this derivative is proportional to the derivative
                       of the activation function f ′(net). Thus, this type of approach is possible only for continuous activation
                       functions and this method cannot be used with hard activation functions (32.4) and (32.5). In this respect
                       the LMS method is more general. The derivatives’ most common continuous activation functions are

                                                         f ′ =  o 1 o)                          (32.22)
                                                              (
                                                                –
                       for the unipolar, Eq. (32.6), and

                                                        f ′ =  0.5 1 o )                        (32.23)
                                                                   2
                                                              (
                                                                –
                       for the bipolar, Eq. (32.7).
                         Using the cumulative approach, the neuron weight w ij  should be changed with a direction of gradient
                                                             P
                                                    ∆w ij =  cx i∑ ( d jp –  o jp )f ′ jp       (32.24)
                                                            p=1

                         In case of the incremental training for each applied pattern
                                                      ∆w ij =  cx i f ′ d j – )                 (32.25)
                                                                (
                                                                    o j
                                                               j
                       the weight change should be proportional to input signal x i , to the difference between desired and actual
                                                                           . Similar to the LMS rule, weights
                       outputs d jp  − o jp , and to the derivative of the activation function  f ′ jp
                       can be updated in both the incremental and the cumulative methods. In comparison to the LMS rule,
                       the delta rule always leads to a solution close to the optimum. As it is illustrated in Fig. 32.8, when the
                       delta rule is used, all four patterns are classified correctly.

                       Error Backpropagation Learning
                       The delta learning rule can be generalized for multilayer networks. Using an approach similiar to the
                       delta rule, the gradient of the global error can be computed with respect to each weight in the network.
                       Interestingly,

                                                        ∆w ij =  cx i f j ′ E j                 (32.26)


                       where
                         c = learning constant,
                         x i = signal on the ith neuron input,
                           = derivative of activation function.
                         f ′ j

                       ©2002 CRC Press LLC
   954   955   956   957   958   959   960   961   962   963   964