Page 77 - Neural Network Modeling and Identification of Dynamical Systems
P. 77
2.2 ARTIFICIAL NEURAL NETWORK TRAINING METHODS 65
2
Backpropagation through time algorithm ∂ e(˜y,z,W) ∂G(z,W) T ∂G(z,W)
=
for error gradient and Hessian. Second-order ∂W 2 ∂W ∂W
sensitivities of the error function are computed ∂ G i (z,W)
n y
2
during a backward-in-time pass as follows: − 2 ω i ˜y i − G i (z,W) ,
∂W
i=1
2
∂λ(t K+1 ) ∂ e(˜y,z,W) ∂G(z,W) T ∂G(z,W)
= 0, =
∂W ∂W∂z ∂W ∂z
2
n y
2
∂λ(t k ) ∂ e(˜y(t k ),z(t k ),W) ∂ G i (z,W)
= − ω i ˜y i − G i (z,W) ,
∂W ∂z∂W ∂W∂z
i=1
2
∂ e(˜y(t k ),z(t k ),W) ∂z(t k )
2
+ ∂ e(˜y,z,W) ∂G(z,W) T ∂G(z,W)
∂z 2 ∂W =
∂z 2 ∂z ∂z
n z # 2
∂ F i (z(t k ),u(t k ),W)
n y
2
+ λ i (t k+1 ) ∂ G i (z,W)
∂z∂W ω i ˜y i − G i (z,W) .
i=1 − ∂z 2
2
∂ F i (z(t k ),u(t k ),W) ∂z(t k ) $ i=1
+ (2.96)
∂z 2 ∂W
T
∂F(z(t k ),u(t k ),W) ∂λ(t k+1 ) In the rest of this subsection, we discuss var-
+ ,
∂z ∂W ious difficulties associated with the recurrent
k = K,...,1. neural network training problem. First, notice
(2.94) that a recurrent neural network which performs
a K-step-ahead prediction may be “unfolded” in
The Hessian of the individual trajectory error time to produce an equivalent layered feedfor-
function (2.84) equals ward neural network, comprised of K copies of
the same subnetwork, one per time step. Each
of these identical subnetworks shares a common
K
2
2
∂ E(W) ∂ e(˜y(t k ),z(t k ),W) set of parameters.
=
∂W 2 ∂W 2 Given a large prediction horizon, the result-
k=1
2
∂ e(˜y(t k ),z(t k ),W) ∂z(t k ) ing feedforward network becomes very deep.
+ Thus, it is natural that all the difficulties associ-
∂W∂z ∂W
ated with deep neural network training are also
n z # 2
∂ F i (z(t k−1 ),u(t k−1 ),W) inherent to recurrent neural network training. In
+ λ i (t k )
∂W 2 fact, these problems become even more severe.
i=1
2
∂ F i (z(t k−1 ),u(t k−1 ),W) ∂z(t k−1 ) $ They include the following:
+
∂W∂z ∂W 1. Vanishing and exploding gradients [71–74].
T
∂F(z(t k−1 ),u(t k−1 ),W) ∂λ(t k ) Note that the sensitivity of a recurrent neu-
+ . ral network (2.13) state at time step t k with
∂W ∂W
respect to its state at time step t l (l k)has
the following form:
(2.95)
k−1
Second-order derivatives of the instantaneous ∂z(t k ) ' ∂F(z(t r ),u(t r ),W)
= . (2.97)
error function (2.85) have the form ∂z(t l ) ∂z
r=l