Page 72 - Neural Network Modeling and Identification of Dynamical Systems
P. 72

60                2. DYNAMIC NEURAL NETWORKS: STRUCTURES AND TRAINING METHODS

                         put (the scalar error) and multiple inputs; there-  We define the error function sensitivities with
                                                                                               l
                         fore reverse mode is significantly faster than the  respect to weighted sums n to be as follows:
                                                                                               i
                         forward mode. As shown in [65], under realis-
                                                                                        l
                         tic assumptions the error function gradient can                δ    ∂E  .          (2.62)
                                                                                        i
                         be computed in reverse mode at a cost of five                       ∂n l i
                         function evaluations or less. Also note that in the
                                                                         Sensitivities for the output layer neurons are
                         ANN field the forward and reverse computation
                                                                      obtained directly, i.e.,
                         modes are usually referred to as forward propa-

                         gation and backward propagation (or backprop-       δ =−ω i ˜y i − a L  ϕ (n ),    (2.63)
                                                                                                   L

                                                                              L
                                                                                               L
                         agation).                                            i            i   i   i
                            In the rest of this subsection we present auto-  while sensitivities for the hidden layer neurons
                         matic differentiation algorithms for the compu-  are computed during a backward pass:
                         tation of gradient, Jacobian, and Hessian of the
                                                                                    l+1
                         squared error function (2.58) in the case of a lay-       S
                                                                          l
                                                                                  l

                                                                               l
                         ered feedforward neural network (2.8). All these  δ = ϕ (n )  δ l+1 w l+1 ,l = L − 1,...,1.
                                                                               i
                                                                          i
                                                                                  i
                                                                                        j
                                                                                            j,i
                         algorithms rely on the fact that the derivatives           j=1
                         of activation functions are known. For example,                                    (2.64)
                         the derivatives of hyperbolic tangent activation
                                                                         Finally, the error function derivatives with re-
                         functions (2.9)are
                                                                      spect to parameters are expressed in terms of
                                                 ⎫
                                                2                     sensitivities, i.e.,

                                            l
                                          l
                             l
                                l
                            ϕ (n ) = 1 − ϕ (n )  ⎬  l = 1,...,L − 1,
                             i  i         i  i
                                                              l                        ∂E    l
                                                 ⎭ i = 1,...,S ,


                                                l
                                        l
                                             l
                                l
                             l
                                          l
                           ϕ (n ) =−2ϕ (n )ϕ (n )                                        l  = δ ,
                                                                                             i
                             i  i       i  i  i  i                                     ∂b i
                                                               (2.59)                                       (2.65)
                                                                                      ∂E     l l−1
                                                                                          = δ a   .
                                                                                             i j
                                                                                        l
                         while the derivatives of a logistic function (2.10)         ∂w i,j
                         equal
                                                                         In a similar manner, we can compute the
                                                   ⎫                  derivatives with respect to network inputs, i.e.,

                                   l
                           l
                              l
                                             l
                                                l
                                      l
                          ϕ (n ) = ϕ (n ) 1 − ϕ (n )  ⎪
                           i  i    i  i      i  i   ⎬  l =1,...,L − 1,
                                                               l                          S  1


                         ϕ (n ) = ϕ (n ) 1 − 2ϕ (n ) ⎭ i =1,...,S .                 ∂E        1  1
                           l
                                   l
                                               l
                              l
                                       l
                                                  l ⎪
                           i  i    i   i       i  i                                     =    δ w .          (2.66)
                                                                                              j
                                                                                                 j,i
                                                                                    ∂a 0
                                                               (2.60)                 i   j=1
                                                                         Forward propagation for network outputs
                         Derivatives of the identity activation functions
                         (2.11)are simply                             Jacobian. We define the pairwise sensitivities of
                                                                      weighted sums to be as follows:

                                  L
                                     L
                                 ϕ (n ) = 1 "                                                  l
                                  i  i                 L                               l,m   ∂n
                                             i = 1,...,S .     (2.61)                 ν        i  .         (2.67)
                                     L
                                ϕ L     (n ) = 0                                       i,j  ∂n m
                                 i   i                                                         j
                            Backpropagation algorithm for error func-    Pairwise sensitivities for neurons of the same
                         tion gradient. First, we perform a forward pass  layer are obtained directly, i.e.,
                                                        l
                         to compute the weighted sums n and activa-
                                                        i                            ν l,l  = 1,
                               l
                                                       l
                         tions a for all neurons i = 1,...,S of each layer            i,i
                               i                                                                            (2.68)
                         l = 1,...,L, according to equations (2.8).                  ν l,l  = 0,i  = j.
                                                                                      i,j
   67   68   69   70   71   72   73   74   75   76   77