Page 143 - Neural Network Modeling and Identification of Dynamical Systems
P. 143
4.1 ANN MODEL OF AIRCRAFT MOTION BASED ON A MULTILAYER NEURAL NETWORK 133
of the outputs y(t) can be fed to the input of the which takes up most of the time spent learning
NARX model instead of their estimates ,aswas the model.
y
the case in the previous method. This approach
has two main advantages. First of all, the accu- 4.1.3 Learning of the Neural Network
racy of the obtained NARX model is increased. Model of Aircraft Motion in
Second, it becomes possible to use for its train-
ing the usual static method of error backprop- Real-Time Mode
agation, whereas for learning the NARX model ANN models discussed in this chapter use
with a purely parallel architecture, we require to sigmoid activation functions for hidden layer
use some form of the dynamic error backpropa- neurons. Such global activation functions pro-
gation method. vide the ANN model with good generalization
properties. However, modification of any tun-
4.1.2 Learning of the Neural Network able parameter changes the behavior of the net-
Model of Aircraft Motion in Batch work throughout the entire input domain. This
Mode fact means that the adaptation of the network to
new data might lead to a decrease of the model
The ANN model is trained in the standard accuracy on the previous data. Thus, to take into
way [5,6]: training is treated as an optimization account the incoming measurements, the ANN
problem, namely, the minimization problem for models of this type should be trained on a very
the error e = y − . The objective function is the large sample, which is not reasonable from a
y
sum of squares of errors for the entire training computational point of view.
sample, To overcome this problem (that is, to per-
form adaptation not only for the current mea-
1 T T
E(w) = e (w)e(w), e =[e 1 ,e 2 ,...,e N ] , surements, but for some sliding time window),
2 we can use the recursive least-squares method
(RLSM), which can be considered as a particular
where e(w) = y −ˆy(w), w is the M-dimensional
vector of configurable network parameters, and case of the Kalman filter (KF) for estimation of
N is the sample length. constant parameters. However, KFs and RLSMs
are directly applicable only for systems whose
We perform the minimization of the objective
observations are linear with respect to the es-
function E(w) with respect to the vector w using
timated parameters, while the neural network
the Levenberg–Marquardt method. The adjust-
observation equation is nonlinear. Therefore, in
ment of the vector w at each optimization step is
as follows: order to use the KF, the observation equation
must be linearized. In particular, statistical lin-
T
w n+1 = w n + (J J + μE) −1 T earization can be used for this purpose.
J e,
Application of this approach to the ANN
where E is the identity matrix and J = J(w n ) is modeling is described in detail in [5]. Again we
the Jacobi matrix, i.e., an (N × M) matrix whose can see that, just as in the case of batch training
ith row is a vector obtained by transposing the of the ANN model, the Jacobian J k computa-
gradient of the function e i . tion is the most time-consuming operation of the
The most time-consuming element of the whole procedure.
training process is the computation of the Jaco- To obtain the model with the required accu-
bian at each step. This operation is performed racy, the training data are taken to be a sequence
using the error backpropagation algorithm [5], of values on a certain sliding observation win-