Page 76 - Neural Network Modeling and Identification of Dynamical Systems
P. 76

64                2. DYNAMIC NEURAL NETWORKS: STRUCTURES AND TRAINING METHODS

                         follows:                                     themselves. We have
                                              ∂E(W)
                                        λ(t k ) =    .         (2.86)    ∂z(t 0 )
                                               ∂z(t k )                   ∂W   = 0,
                            Error function sensitivities are computed dur-  ∂z(t k )  =  ∂F(z(t k−1 ),u(t k−1 ),W)
                         ing a backward-in-time pass, i.e.,               ∂W             ∂W
                                                                                   ∂F(z(t k−1 ),u(t k−1 ),W) ∂z(t k−1 )
                                                                                +                             ,
                           λ(t K+1 ) = 0,                                                  ∂z           ∂W
                                    ∂e(˜y(t k ),z(t k ),W)                        k = 1,...,K.
                             λ(t k ) =                                                                      (2.90)
                                          ∂z                   (2.87)
                                      ∂F(z(t k ),u(t k ),W) T
                                    +                  λ(t k+1 ),        The gradient of the individual trajectory error
                                             ∂z
                                                                      function (2.84) equals
                                     k = K,...,1.
                                                                                  K
                                                                         ∂E(W)      ∂e(˜y(t k ),z(t k ),W)
                            Finally, the error function derivatives with re-   =
                         spect to parameters are expressed in terms of    ∂W               ∂W
                                                                                 k=1                        (2.91)
                         sensitivities, i.e.,                                            T
                                                                                   ∂z(t k ) ∂e(˜y(t k ),z(t k ),W)
                                                                                 +                        .
                                      K                                             ∂W           ∂z
                            ∂E(W)        ∂e(˜y(t k ),z(t k ),W)
                                   =
                              ∂W               ∂W                        A Gauss–Newton Hessian approximation
                                     k=1                              may be obtained as follows:
                                        ∂F(z(t k−1 ),u(t k−1 ),W)  T
                                     +                       λ(t k ).
                                                ∂W                      2         K  2
                                                                       ∂ E(W)       ∂ e(˜y(t k ),z(t k ),W)
                                                                               ≈
                                                               (2.88)    ∂W 2             ∂W  2
                                                                                 k=1
                                                                                    2
                                                                                   ∂ e(˜y(t k ),z(t k ),W) ∂z(t k )
                            First-order derivatives of the instantaneous         +
                         error function (2.85) have the form                             ∂W∂z        ∂W
                                                                                         T
                                                                                           2
                                                                                   ∂z(t k ) ∂ e(˜y(t),z(t k ),W)
                                                                                 +
                                                  T
                            ∂e(˜y,z,W)   ∂G(z,W)                                    ∂W         ∂z∂W
                                      =−             ˜y − G(z,W) ,                       T  2
                               ∂W           ∂W                                     ∂z(t k ) ∂ e(˜y(t k ),z(t k ),W) ∂z(t k )
                                                                                 +                              .
                                                  T
                            ∂e(˜y,z,W)   ∂G(z,W)                                    ∂W           ∂z 2       ∂W
                                      =−             ˜y − G(z,W) .                                          (2.92)
                               ∂z           ∂z
                                                               (2.89)
                                                                      The corresponding approximations to second-
                         Sine the mappings F and G are represented    order derivatives of the instantaneous error
                         by layered feedforward neural networks, their  function have the form
                         derivatives can be computed as described in Sec-                     T
                                                                          2
                                                                         ∂ e(˜y,z,W)  ∂G(z,W)    ∂G(z,W)
                         tion 2.2.2.                                           2    ≈                     ,
                            Real-Time Recurrent Learning algorithm          ∂W          ∂W          ∂W
                                                                          2
                         (RTRL) [68–70] for network outputs Jacobian.    ∂ e(˜y,z,W)  ∂G(z,W)  T  ∂G(z,W)  , (2.93)
                                                                                    ≈
                         The model state sensitivities with respect        ∂W∂z         ∂W          ∂z
                                                                          2
                         to network parameters are computed during       ∂ e(˜y,z,W)  ∂G(z,W)  T  ∂G(z,W)
                         the forward-in-time pass, along with the states    ∂z 2    ≈    ∂z         ∂z    .
   71   72   73   74   75   76   77   78   79   80   81