Page 817 - Mechanical Engineers' Handbook (Volume 2)
P. 817

808   Neural Networks in Feedback Control Systems



































                                           Figure 14 NN observer for output feedback control.


           8.1  NN Reinforcement Learning Controller
                          A simple signal related to the performance of a robotic system is the signum of the sliding
                          variable error R(t)   sgn(r(t)), with the sliding variable error given by r   ˙e     e, where
                          e   q   q is the tracking error and matrix   is positive definite. Signal R(t) satisfies the
                              d
                          criteria required in reinforcement learning control: (i) It is simple, having values of only 0,
                           1 and (ii) the value of zero corresponds to a reward for good performance, while nonzero
                          values correspond to a punishment signal. Therefore, R(t) may be taken as a suitable rein-
                          forcement learning signal.
                             Rigorous proofs of closed-loop stability and performance for reinforcement learning may
                                   31
                          be provided by (i) using nonstandard Lyapunov functions, (ii) deriving novel modified NN
                          tuning algorithms, and (iii) selection of a suitable multiloop control structure. The architec-
                          ture of the reinforcement adaptive learning NN controller derived is shown in Fig. 15. A
                          performance evaluation loop has the desired trajectory q (t) as the user input; this loop
                                                                        d
                          manufactures r(t), which may be considered as the instantaneous utility. The critic element
                          evaluates the signum function and so provides the reinforcement signal r(t) which critiques
                          the performance of the system.
                             It is not easy to show how to tune the action-generating NN using only the reinforcement
                          signal r(t), which contains significantly less information than the full error signal r(t). A
                          successful proof can be based on the Lyapunov energy function
                                                  L(t)       r      1  tr(WF W)
                                                        n
                                                                       1 ˜
                                                                  ˜
                                                                    T
                                                       i 1  i  2
   812   813   814   815   816   817   818   819   820   821   822