Page 75 - Neural Network Modeling and Identification of Dynamical Systems
P. 75
2.2 ARTIFICIAL NEURAL NETWORK TRAINING METHODS 63
accurate closed-loop multistep-ahead prediction function e of the following form:
of the dynamical system behavior. In this sub-
section, we discuss the most general state space 1 T
e(˜y,z,W) = ˜ y − G(z,W) ˜y − G(z,W) ,
form (2.13) of dynamic neural networks. 2
Assume we are given an experimental data (2.85)
set of the form
) is the diagonal ma-
&
% (p) P where = diag(ω 1 ,...,ω n y
K
u (p) (t k ), ˜y (p) (t k ) , (2.82) trix of error weights, usually taken inversely
k=0
p=1 proportional to the corresponding variances of
measurement noise.
where P is the total number of trajectories, K (p) We need to minimize the total prediction er-
is the number of time steps for the correspond- ror E with respect to the neural network param-
¯
ing trajectory, t k = k t are the discrete time in- eters W. Again, the minimization can be carried
stants, u (p) (t k ) are the control inputs, and ˜y (p) (t k ) out using any of the optimization methods de-
are the observed outputs. We will also denote scribedinSection 2.2.1, provided we can com-
the total duration of the pth trajectory by ¯ t (p) = pute the gradient and Hessian of the error func-
K (p) t. tion with respect to the parameters. Just like in
Note that in general the observed outputs the case of static neural networks, the total error
˜ y (p) (t k ) do not match the true outputs y (p) (t k ). gradient ∇E and Hessian ∇ E may be expressed
2 ¯
¯
We assume that the observations are corrupted in terms of the individual error gradients ∇E (p)
by an additive white Gaussian noise, i.e., and Hessians ∇ E (p) . Thus, we describe the al-
2
gorithms for computation of the derivatives for
˜ y (p) (t) = y (p) (t) + η (p) (t). (2.83) (p)
E and omit the trajectory index p.
Again, we have two different computation
(p)
That is, η (t) represents a stationary Gaussian modes: forward-in-time and reverse-in-time,
process with zero mean and a covariance func-
each with its own advantages and disadvan-
tion K η (t 1 ,t 2 ) = δ(t 2 − t 1 ) , where
tages. The forward-in-time approach theoret-
⎛ 2 ⎞ ically allows one to work with infinite dura-
σ
1 0 tion trajectories, i.e., to perform online adapta-
.
⎜ ⎟
= ⎜ . . ⎟ . tion as the new data arrive. In practice, how-
⎝ ⎠
0 2 ever, each iteration is more computationally ex-
σ
n y
pensive as compared to the reverse-in-time ap-
proach. The reverse-in-time approach is only ap-
(p)
The individual errors E for each trajectory
have the following form: plicable when the whole training set is available
beforehand, but it works significantly faster.
K (p) BackPropagation through time algorithm
(p) (p) (p) (BPTT) [67–69] for error function gradient.
E (W) = e(˜y (t k ),z (t k ),W), (2.84)
First, we perform a forward pass to compute
k=1
the predicted states z(t k ) for all time steps t k ,
where z (p) (t k ) are the model states and e (p) : R × k = 1,...,K, according to equations (2.13). We
n y
R n z × R n w → R represents the model prediction also compute the error E(W) according to (2.84)
error at time instant t k . Under the abovemen- and (2.85).
tioned assumptions on the observation noise, it We define the error function sensitivities with
is reasonable to utilize the instantaneous error respect to model states at time step t k to be as