Page 74 - Neural Network Modeling and Identification of Dynamical Systems
P. 74

62                2. DYNAMIC NEURAL NETWORKS: STRUCTURES AND TRAINING METHODS

                            Due to continuity of second partial deriva-  ties during the backward pass, i.e.,
                         tives of an error function with respect to net-
                                                                              #                            $
                                                                                         2
                                                                                                         L
                                                                                  L

                         work parameters, the Hessian matrix is sym-  δ L,0  = ω i  ϕ (n )  − ˜y i − a L  ϕ L     (n ) ν L,0 ,
                                                                                     L
                                                                                                  i
                                                                                     i
                                                                                  i
                                                                                                         i
                                                                                                     i
                                                                       i,j
                                                                                                              i,j
                         metric. Therefore, we need to compute only the
                                                                                   l+1
                         lower-triangular part of the Hessian matrix. The  l,0  l     l  S    l+1 l+1,0
                         error function second derivatives with respect  δ i,j  = ϕ (n )  w k,i  δ k,j
                                                                                 i
                                                                              i
                         to parameters are expressed in terms of second-          k=1
                                                                                        l+1
                         order sensitivities. We have                                l,0  S
                                                                                  l
                                                                               l
                                                                                              δ
                                                                           + ϕ (n )ν i,j  w l+1 l+1 ,l = L − 1,...,1.
                                                                               i
                                                                                  i
                                                                                               k
                                                                                            k,i
                                                                                       k=1
                                2
                              ∂ E      l,m                                                                  (2.80)
                                    = δ i,k  ,
                               l
                             ∂b ∂b m
                               i  k                                   Then, the second derivatives of the error func-
                               2
                              ∂ E      l,m m−1                        tion with respect to network inputs are ex-
                                    = δ i,k  a r  ,
                              l
                            ∂b ∂w m                                   pressed in terms of additional second-order sen-
                              i  k,r
                               2
                              ∂ E      l,m l−1  l  l−1    l−1  l−1,m  sitivities, i.e.,
                                    = δ i,k  a j  + δ ϕ  (n j  )ν j,k  ,
                                                i j
                              l
                           ∂w ∂b  m                                               S 1
                                                                           2
                              i,j  k                                      ∂ E         1  1,0
                                        l> 1,                            ∂a ∂a 0 j  =  w δ  ,
                                                                                      k,i k,j
                                                                           0
                                                                           i
                               2
                              ∂ E      1,1 0                               2     k=1
                                    = δ i,k j                             ∂ E     l,0
                                         a ,
                              1
                            ∂w ∂b  1                                           = δ ,
                                                                                  i,k
                                                                           l
                              i,j  k                                     ∂b ∂a 0 k
                                                                           i
                              2
                             ∂ E       l,m l−1 m−1                        2
                                    = δ  a   a r                         ∂ E      l,0 l−1  l  l−1     l−1  l−1,0
                             l
                          ∂w ∂w  m     i,k  j                             l   0  = δ a  + δ ϕ   (n j  )ν j,k  ,l > 1,
                                                                                  i,k j
                                                                                           i j
                             i,j  k,r                                  ∂w ∂a
                                                                          i,j  k
                                         l
                                                         a
                                      + δ ϕ l−1   (n l−1 )ν l−1,m m−1 ,l > 1,  2
                                         i j    j   j,k   r              ∂ E      1,0 0  1
                                                                               = δ  a + δ ,
                                                                                         i
                              2
                                                                          1
                             ∂ E       1,1 0 0                         ∂w ∂a  0   i,j j
                                         a a .
                                    = δ i,k j r                           i,j  j
                             1
                          ∂w ∂w  1                                        2
                             i,j  k,r                                    ∂ E      1,0 0
                                                                                    a ,j  = k.
                                                               (2.78)     1   0  = δ i,k j
                                                                       ∂w ∂a
                                                                          i,j  k
                                                                                                            (2.81)
                            If we additionally define the second-order
                         sensitivities of the error function with respect to  2.2.3 Dynamic Neural Network Training
                         network inputs,
                                                                         Traditional dynamic neural networks, such as
                                                                      the NARX and Elman networks, represent con-
                                                                      trolled discrete time dynamical systems. Thus, it
                                                2
                                               ∂ E
                                         l,0
                                        δ           ,          (2.79)  is natural to utilize them as models for discrete
                                         i,j    l  0
                                              ∂n ∂a
                                                i  j                  time dynamical systems. However, they can also
                                                                      be used as models for the continuous time dy-
                                                                      namical systems under the assumption of a uni-
                         then we obtain error function second deriva-  form time step  t. In this book we focus on
                         tives with respect to network inputs. First, we  the latter problem. That is, we wish to train the
                         compute the additional second-order sensitivi-  dynamic neural network so that it can perform
   69   70   71   72   73   74   75   76   77   78   79