Page 821 - Mechanical Engineers' Handbook (Volume 2)
P. 821

812   Neural Networks in Feedback Control Systems

                                                  V  T       1  V  T         V
                                                                       1
                                              0      ƒ   Q        g(x)Rg(x)                   (10)
                                                   x         4  x            x
                          The boundary condition for this equation is V(0)   0. Solving this equation yields the optimal
                          value function V(x), whence the optimal control may be computed from the cost gradient
                          using (4).
                             This procedure will give the optimal control in feedback form for any nonlinear system.
                          Unfortunately, the HJB equation cannot be solved for most nonlinear systems. In the linear
                          system case, the HJB equation yields the Riccati equation, for which efficient solution tech-
                          niques are available. However, most systems of interest today in aerospace, vehicles, and
                          industry are nonlinear.
                             Therefore, one may use a SA approach wherein (8) and (9) are iterated to determine
                                   (i)
                                       (i)
                          sequences V , u . The initial stabilizing control u (0)  used in (8) to find V  (0)  is easily deter-
                          mined using, for example, the linear quadratic regulator (LQR) for the linearization of (6).
                          It has been shown by Saridis and Lee 39  that the SA converges to the optimal solution V*,
                          u* of the HJB equation. Let the region of asymptotic stability of the optimal solution be  *
                                                                               (i)
                          and the region with asymptotic stability (RAS) at iteration i be   . Then, in fact, it has
                          been shown that
                             u is stabilizing for all i;
                              (i)
                             V  (i)  → V*, u (i)  → u*,   →  *  uniformly;
                                               (i)
                               (i)
                             V (x)   V  (i 1) (x),  that is, the value function decreases; and
                                    (i 1) ,  that is, the RAS increases.
                               (i)
                          In fact,  * is the largest RAS of any other admissible control law.
                          NNs for Computation of Successive Approximation Solution
                          It is difficult to solve Eqs. (8) and (9) as required for the SA method just given. Beard et
                           40
                          al. showed how to implement the SA algorithm using the Galerkin approximation to solve
                          the nonlinear Lyapunov equation. This method is computationally intensive, since it requires
                          the evaluation of numerous integrals. It was shown in Ref. 38 how to use NNs to compute
                          the SA solution at each iteration. This yields a computationally effective method for deter-
                          mining nearly optimal controls for a general class of nonlinear constrained input systems.
                          The value function at each iteration is approximated using a NN by
                                                   V(x)   V(x,w )   w (i) T   (x)
                                                             j
                          with w the NN weights and  (x) a basis set of activation functions. To satisfy the initial
                               j
                                   (i)
                          condition V (0)   0 and the symmetry requirements on V(x), the activation functions were
                          selected as a basis of even polynomials in x. Then the parameterized nonlinear Lyapunov
                          equation becomes
                                                                            T
                                                                           (i)
                                                                  (i)
                                           0   w (i) T    (x)(ƒ(x)   g(x)u )   Q   uRu (i)
                               (i)
                          with u the current control value. Evaluating this equation at enough sample values of x,it
                          can easily be solved for the weights using, for example, least squares. The sample values of
                          x must satisfy a condition known as persistence of excitation in order to obtain a unique
                          least-squares solution for the weights. The number of samples selected must be greater than
                          the number of NN weights. Then, the next iteration value of the control is given by
                                                                     T
                                                 u (i 1) (x)    –Rg (x)    (x)w (i)
                                                              1T
                                                           1
                                                           2
                             Using a Sobolev space setting, it was shown that under certain mild assumptions the
                          NN solution converges in the mean to a suitably close approximation of the optimal solution.
   816   817   818   819   820   821   822   823   824   825   826