Page 143 - Neural Network Modeling and Identification of Dynamical Systems
P. 143

4.1 ANN MODEL OF AIRCRAFT MOTION BASED ON A MULTILAYER NEURAL NETWORK  133
                          of the outputs y(t) can be fed to the input of the  which takes up most of the time spent learning
                          NARX model instead of their estimates  ,aswas  the model.
                                                            y
                          the case in the previous method. This approach
                          has two main advantages. First of all, the accu-  4.1.3 Learning of the Neural Network
                          racy of the obtained NARX model is increased.      Model of Aircraft Motion in
                          Second, it becomes possible to use for its train-
                          ing the usual static method of error backprop-     Real-Time Mode
                          agation, whereas for learning the NARX model   ANN models discussed in this chapter use
                          with a purely parallel architecture, we require to  sigmoid activation functions for hidden layer
                          use some form of the dynamic error backpropa-  neurons. Such global activation functions pro-
                          gation method.                               vide the ANN model with good generalization
                                                                       properties. However, modification of any tun-
                          4.1.2 Learning of the Neural Network         able parameter changes the behavior of the net-
                                Model of Aircraft Motion in Batch      work throughout the entire input domain. This
                                Mode                                   fact means that the adaptation of the network to
                                                                       new data might lead to a decrease of the model
                            The ANN model is trained in the standard   accuracy on the previous data. Thus, to take into
                          way [5,6]: training is treated as an optimization  account the incoming measurements, the ANN
                          problem, namely, the minimization problem for  models of this type should be trained on a very
                          the error e = y −  . The objective function is the  large sample, which is not reasonable from a
                                         y
                          sum of squares of errors for the entire training  computational point of view.
                          sample,                                        To overcome this problem (that is, to per-
                                                                       form adaptation not only for the current mea-
                                  1  T                           T
                           E(w) =  e (w)e(w),     e =[e 1 ,e 2 ,...,e N ] ,  surements, but for some sliding time window),
                                  2                                    we can use the recursive least-squares method
                                                                       (RLSM), which can be considered as a particular
                          where e(w) = y −ˆy(w), w is the M-dimensional
                          vector of configurable network parameters, and  case of the Kalman filter (KF) for estimation of
                          N is the sample length.                      constant parameters. However, KFs and RLSMs
                                                                       are directly applicable only for systems whose
                            We perform the minimization of the objective
                                                                       observations are linear with respect to the es-
                          function E(w) with respect to the vector w using
                                                                       timated parameters, while the neural network
                          the Levenberg–Marquardt method. The adjust-
                                                                       observation equation is nonlinear. Therefore, in
                          ment of the vector w at each optimization step is
                          as follows:                                  order to use the KF, the observation equation
                                                                       must be linearized. In particular, statistical lin-
                                              T
                                 w n+1 = w n + (J J + μE) −1 T         earization can be used for this purpose.
                                                         J e,
                                                                         Application of this approach to the ANN
                          where E is the identity matrix and J = J(w n ) is  modeling is described in detail in [5]. Again we
                          the Jacobi matrix, i.e., an (N × M) matrix whose  can see that, just as in the case of batch training
                          ith row is a vector obtained by transposing the  of the ANN model, the Jacobian J k computa-
                          gradient of the function e i .               tion is the most time-consuming operation of the
                            The most time-consuming element of the     whole procedure.
                          training process is the computation of the Jaco-  To obtain the model with the required accu-
                          bian at each step. This operation is performed  racy, the training data are taken to be a sequence
                          using the error backpropagation algorithm [5],  of values on a certain sliding observation win-
   138   139   140   141   142   143   144   145   146   147   148