Page 826 - Mechanical Engineers' Handbook (Volume 2)
P. 826
11 Historical Development, Referenced Work, and Further Study 817
Q (x ,h(x )) V (x )
k
h
h
k
k
where subscript h denotes a prescribed control or policy sequence u h(x ). A recursion
k
k
for Q is given by
Q (x ,u ) r(x ,u ) Q (x k 1 ,h(x k 1 ))
h
k
k
k
k
h
In terms of Q, Bellman’s principle is particularly easy to write; in fact, defining the optimal
Q value as
))
Q*(x ,u ) r(x ,u ) V*(x k 1
k k k k
one has the optimal value as
V*(x ) min(Q*(x ,u ))
k
k
k
u k
The optimal control policy is given by
h*(x ) arg min (Q*(x ,u ))
k
k
k
u k
Watkins showed that the following successive iteration scheme, known as Q learning, con-
verges to the optimal solution:
(x ):
k
1. Find the Q value for the prescribed policy h j
Q (x ,u ) r(x ,u ) Q (x k 1 ,h (x k 1 ))
j
j
k
k
k
k
j
2. Policy improvement:
(x ) arg min (Q (x ,u ))
h j 1 k j k k
u k
Using NN to approximate the Q function and the policy, one can write the ADHDP algorithm
in a very straightforward manner. Since the control input action u is now explicitly an input
k
to the critic NN, this is known as action-dependent HDP. Q learning converges faster than
HDP and can be used in the case of unknown system dynamics. 48
An action-dependent version of DHP is also available wherein the gradients of the Q
function are approximated using NNs. Note that two NNs are needed, since there are two
gradients, as Q is a function of both x and u .
k
k
11 HISTORICAL DEVELOPMENT, REFERENCED WORK, AND FURTHER STUDY
A firm foundation for the use of NNs in feedback control systems has been developed over
the years by many researchers. Included here is a historical development and references to
the body of work in neurocontrol.
11.1 NN for Feedback Control
8
The use of NNs in feedback control systems was first proposed by Werbos. Since then, NNs
control has been studied by many researchers. Recently, NNs have entered the mainstream
of control theory as a natural extension of adaptive control to systems that are nonlinear in
the tunable parameters. The state of NN control is well illustrated by papers in the Auto-
matica Special issue on NN control. 49 Overviews of the initial work in NN control are
provided by Miller et al. 50 and the Handbook of Intelligent Control, 51 which highlighted a
host of difficulties to be addressed for closed-loop control applications. Neural network

