Page 824 - Mechanical Engineers' Handbook (Volume 2)

P. 824

10 Approximate Dynamic Programming and Adaptive Critics 815

2
i
i 2
2
T
i
T i
i
0 w (x)˙x r(x,u ,d ) w (x)F(x,u ,d ) hh u d
j
j
j
j
j
which can easily be solved for the weights using, for example, least squares. Then, on
disturbance iterations the next disturbance is given by
1T
1
T
d i 1 (x) –Rk (x) (x)w i j
2
and on control iterations the improved control is given by
T
1 T
u j 1 (x) 1 –Rg (x) (x)w i j
2
This algorithm is shown to converge to the approximately optimal H solution. This yields

a NN feedback controller as shown in Fig. 17 for the H case.
2
10 APPROXIMATE DYNAMIC PROGRAMMING AND ADAPTIVE CRITICS
Approximate dynamic programming (ADP) is based on the optimal formulation of the feed-
back control problem. For discrete-time systems, the optimal control problem may be solved
using dynamic programming, which is a backward-in-time procedure and so unsuitable for
37
online implementation. ADP is based on using nonlinear approximators to solve the HJ
equations forward in time and was ﬁrst suggested by Werbos. See the Section 11 for cited
44
works of major researchers in the area of ADP. The current status of work in ADP is given
in Ref. 45.
The previous section presented the continuous-time formulation of the optimal control
problem. For discrete-time systems of the form
x ƒ(x ,u )
k 1 k k
with k the time index, one may select the cost or performance measure
V(x ) i k r(x ,u )

k
i k i i
with a discount factor and r(x ,u ) known as the instantaneous utility. A ﬁrst-difference
k
k
equivalent to this yields a recursion for the value function given by
V(x ) r(x ,u ) V(x k 1 )
k
k
k
One may invoke Bellman’s principle to ﬁnd the optimal cost as
V*(x ) min (r(x ,u ) V*(x k 1 ))
k
k
k
u k
and the optimal control as
u*(x ) arg min (r(x ,u ) V*(x k 1 ))
k
k
k
u k
Determining the optimal controller using these equations requires an iterative procedure
known as dynamic programming that progresses backward in time. This is unsuitable for
real-time implementation and is computationally complex.
The goal of ADP is to provide approximate techniques for evaluating the optimal value
and optimal control using techniques that progress forward in time, so that they can be
implemented in actual control systems. Howard 46 showed that the following successive it-
eration scheme, known as policy iteration, converges to the optimal solution:

819 820 821 822 823 824 825 826 827 828 829