Page 821 - Mechanical Engineers' Handbook (Volume 2)

P. 821

812 Neural Networks in Feedback Control Systems

V T 1 V T V
1
0 ƒ Q g(x)Rg(x) (10)
x 4 x x
The boundary condition for this equation is V(0) 0. Solving this equation yields the optimal
value function V(x), whence the optimal control may be computed from the cost gradient
using (4).
This procedure will give the optimal control in feedback form for any nonlinear system.
Unfortunately, the HJB equation cannot be solved for most nonlinear systems. In the linear
system case, the HJB equation yields the Riccati equation, for which efﬁcient solution tech-
niques are available. However, most systems of interest today in aerospace, vehicles, and
industry are nonlinear.
Therefore, one may use a SA approach wherein (8) and (9) are iterated to determine
(i)
(i)
sequences V , u . The initial stabilizing control u (0) used in (8) to ﬁnd V (0) is easily deter-
mined using, for example, the linear quadratic regulator (LQR) for the linearization of (6).
It has been shown by Saridis and Lee 39 that the SA converges to the optimal solution V*,
u* of the HJB equation. Let the region of asymptotic stability of the optimal solution be *
(i)
and the region with asymptotic stability (RAS) at iteration i be . Then, in fact, it has
been shown that
u is stabilizing for all i;
(i)
V (i) → V*, u (i) → u*, → * uniformly;
(i)
(i)
V (x) V (i 1) (x), that is, the value function decreases; and
(i 1) , that is, the RAS increases.
(i)
In fact, * is the largest RAS of any other admissible control law.
NNs for Computation of Successive Approximation Solution
It is difﬁcult to solve Eqs. (8) and (9) as required for the SA method just given. Beard et
40
al. showed how to implement the SA algorithm using the Galerkin approximation to solve
the nonlinear Lyapunov equation. This method is computationally intensive, since it requires
the evaluation of numerous integrals. It was shown in Ref. 38 how to use NNs to compute
the SA solution at each iteration. This yields a computationally effective method for deter-
mining nearly optimal controls for a general class of nonlinear constrained input systems.
The value function at each iteration is approximated using a NN by
V(x) V(x,w ) w (i) T (x)
j
with w the NN weights and (x) a basis set of activation functions. To satisfy the initial
j
(i)
condition V (0) 0 and the symmetry requirements on V(x), the activation functions were
selected as a basis of even polynomials in x. Then the parameterized nonlinear Lyapunov
equation becomes
T
(i)
(i)
0 w (i) T (x)(ƒ(x) g(x)u ) Q uRu (i)
(i)
with u the current control value. Evaluating this equation at enough sample values of x,it
can easily be solved for the weights using, for example, least squares. The sample values of
x must satisfy a condition known as persistence of excitation in order to obtain a unique
least-squares solution for the weights. The number of samples selected must be greater than
the number of NN weights. Then, the next iteration value of the control is given by
T
u (i 1) (x) –Rg (x) (x)w (i)
1T
1
2
Using a Sobolev space setting, it was shown that under certain mild assumptions the
NN solution converges in the mean to a suitably close approximation of the optimal solution.

816 817 818 819 820 821 822 823 824 825 826