Page 73 - Neural Network Modeling and Identification of Dynamical Systems
P. 73

2.2 ARTIFICIAL NEURAL NETWORK TRAINING METHODS               61
                          Since the activations of neurons of some layer m  of additional sensitivities, i.e.,
                          do not affect the neurons of preceding layers l<
                          m, the corresponding pairwise sensitivities are           ∂a L  = ϕ (n )ν L,0 .   (2.74)

                                                                                      i
                                                                                              L
                                                                                           L
                          identically zero, i.e.,                                   ∂a 0   i  i  i,j
                                                                                      j
                                        ν l,m  = 0,m > l.      (2.69)    Backpropagation algorithm for error gradi-
                                         i,j
                                                                       ent and Hessian [66]. First, we perform a for-
                            The remaining pairwise sensitivities are com-  ward pass to compute the weighted sums n and
                                                                                                             l
                                                                                                             i
                          puted during the forward pass, along with the  activations a according to Eqs. (2.8), and also to
                                                                                  l
                                                                                  i
                                         l
                                                         l
                          weighted sums n and activations a , i.e.,                                    l,m
                                         i               i             compute the pairwise sensitivities ν  accord-
                                                                                                       i,j
                                                                       ing to (2.68)–(2.70).
                                  l−1
                                 S
                                      l
                           ν l,m  =  w ϕ l−1   (n l−1 )ν l−1,m ,l = 2,...,L.  We define the error function second-order
                            i,j       i,k k   k   k,j                  sensitivities with respect to weighted sums to be
                                 k=1
                                                               (2.70)  as follows:
                                                                                              2
                                                                                             ∂ E
                                                                                      l,m
                            Finally, the derivatives of neural network out-          δ i,j  ∂n ∂n m  .      (2.75)
                                                                                             l
                          puts with respect to parameters are expressed in                   i  j
                          terms of pairwise sensitivities, i.e.,         Next, during a backward pass we compute
                                                                                                     l
                                                                       the error function sensitivities δ as well as
                                                                                                     i
                                   ∂a L                                                             l,m

                                             L
                                          L
                                     i  = ϕ (n )ν L,m ,                the second-order sensitivities δ i,j  . According
                                  ∂b m    i  i  i,j                    to Schwarz’s theorem on equality of mixed
                                     j
                                                               (2.71)
                                  ∂a L                                 partials, due to continuity of second partial
                                    i  = ϕ (n )ν L,m m−1 .             derivatives of an error function with respect to
                                             L

                                          L
                                                   a
                                 ∂w m     i  i  i,j  k
                                    j,k                                                        l,m  m,l
                                                                       weighted sums, we have δ  = δ   . Hence, we
                                                                                               i,j  j,i
                            If we additionally define the sensitivities of  need to compute the second-order sensitivities
                          weighted sums with respect to network inputs,  only for the case m   l.
                                                                         Second-order sensitivities for the output layer
                                                ∂n l                   neurons are obtained directly, i.e.,
                                           l,0    i
                                          ν        ,           (2.72)
                                           i,j    0
                                                ∂a                             #                           $
                                                  j                     L,m        L    L    2     L     L      L  L,m
                                                                       δ   = ω i  ϕ (n ) − ˜y i − a  ϕ  (n ) ν   ,
                                                                        i,j        i  i           i  i   i    i,j
                          then we obtain the derivatives of network out-
                                                                                                            (2.76)
                          puts with respect to network inputs. First, we
                          compute the additional sensitivities during the
                                                                       while second-order sensitivities for the hidden
                          forward pass, i.e.,
                                                                       layer neurons are computed during a backward
                                                                       pass, i.e.,
                             1,0   1
                           ν   = w ,
                             i,j   i,j
                                                                                        l+1
                                 S  l−1                                                S
                             l,0      l  l−1     l−1  l−1,0                 l,m   l     l   l+1 l+1,m
                            ν  =    w ϕ     (n   )ν   ,l = 2,...,L.        δ i,j  = ϕ (n )  w k,i  δ k,j
                                                                                  i
                                                                                     i
                             i,j      i,k k   k   k,j
                                 k=1                                                   k=1
                                                                                              l+1
                                                               (2.73)                     l,m  S            (2.77)
                                                                                    l
                                                                                        l
                                                                                                    δ
                                                                                 + ϕ (n )ν      w l+1 l+1 ,
                                                                                    i   i  i,j    k,i  k
                          Then, the derivatives of network outputs with                      k=1
                          respect to network inputs are expressed in terms       l = L − 1,...,1.
   68   69   70   71   72   73   74   75   76   77   78