Page 817 - Mechanical Engineers' Handbook (Volume 2)

P. 817

808 Neural Networks in Feedback Control Systems

Figure 14 NN observer for output feedback control.

8.1 NN Reinforcement Learning Controller
A simple signal related to the performance of a robotic system is the signum of the sliding
variable error R(t) sgn(r(t)), with the sliding variable error given by r ˙e e, where
e q q is the tracking error and matrix is positive deﬁnite. Signal R(t) satisﬁes the
d
criteria required in reinforcement learning control: (i) It is simple, having values of only 0,
1 and (ii) the value of zero corresponds to a reward for good performance, while nonzero
values correspond to a punishment signal. Therefore, R(t) may be taken as a suitable rein-
forcement learning signal.
Rigorous proofs of closed-loop stability and performance for reinforcement learning may
31
be provided by (i) using nonstandard Lyapunov functions, (ii) deriving novel modiﬁed NN
tuning algorithms, and (iii) selection of a suitable multiloop control structure. The architec-
ture of the reinforcement adaptive learning NN controller derived is shown in Fig. 15. A
performance evaluation loop has the desired trajectory q (t) as the user input; this loop
d
manufactures r(t), which may be considered as the instantaneous utility. The critic element
evaluates the signum function and so provides the reinforcement signal r(t) which critiques
the performance of the system.
It is not easy to show how to tune the action-generating NN using only the reinforcement
signal r(t), which contains signiﬁcantly less information than the full error signal r(t). A
successful proof can be based on the Lyapunov energy function
L(t) r 1 tr(WF W)
n
1 ˜
˜
T
i 1 i 2

812 813 814 815 816 817 818 819 820 821 822