Page 824 - Mechanical Engineers' Handbook (Volume 2)
P. 824

10 Approximate Dynamic Programming and Adaptive Critics  815

                                                                                   2
                                                       i
                                                                                          i 2
                                                                                       2
                                                                            T
                                                            i
                                        T i
                                                                       i
                                  0   w   (x)˙x   r(x,u ,d )   w   (x)F(x,u ,d )   hh    u        d
                                                     j
                                       j
                                                                                  j
                                                                     j
                                                            j
                           which can easily be solved for the weights using, for example, least squares. Then, on
                           disturbance iterations the next disturbance is given by
                                                               1T
                                                            1
                                                                      T
                                                   d i 1 (x)    –Rk (x)    (x)w i j
                                                            2
                           and on control iterations the improved control is given by
                                                                      T
                                                               1 T
                                                   u j 1 (x)    1 –Rg (x)    (x)w i j
                                                            2
                           This algorithm is shown to converge to the approximately optimal H solution. This yields

                           a NN feedback controller as shown in Fig. 17 for the H case.
                                                                       2
            10   APPROXIMATE DYNAMIC PROGRAMMING AND ADAPTIVE CRITICS
                           Approximate dynamic programming (ADP) is based on the optimal formulation of the feed-
                           back control problem. For discrete-time systems, the optimal control problem may be solved
                           using dynamic programming, which is a backward-in-time procedure and so unsuitable for
                                                  37
                           online implementation. ADP is based on using nonlinear approximators to solve the HJ
                           equations forward in time and was first suggested by Werbos. See the Section 11 for cited
                                                                           44
                           works of major researchers in the area of ADP. The current status of work in ADP is given
                           in Ref. 45.
                              The previous section presented the continuous-time formulation of the optimal control
                           problem. For discrete-time systems of the form
                                                         x     ƒ(x ,u )
                                                          k 1    k  k
                           with k the time index, one may select the cost or performance measure
                                                      V(x )       i k r(x ,u )

                                                         k
                                                             i k     i  i
                           with   a discount factor and r(x ,u ) known as the instantaneous utility. A first-difference
                                                     k
                                                       k
                           equivalent to this yields a recursion for the value function given by
                                                    V(x )   r(x ,u )    V(x k 1 )
                                                       k
                                                               k
                                                             k
                           One may invoke Bellman’s principle to find the optimal cost as
                                                 V*(x )   min (r(x ,u )    V*(x k 1 ))
                                                                 k
                                                    k
                                                               k
                                                         u k
                           and the optimal control as
                                               u*(x )   arg min (r(x ,u )    V*(x k 1 ))
                                                                k
                                                   k
                                                                  k
                                                           u k
                           Determining the optimal controller using these equations requires an iterative procedure
                           known as dynamic programming that progresses backward in time. This is unsuitable for
                           real-time implementation and is computationally complex.
                              The goal of ADP is to provide approximate techniques for evaluating the optimal value
                           and optimal control using techniques that progress forward in time, so that they can be
                           implemented in actual control systems. Howard 46  showed that the following successive it-
                           eration scheme, known as policy iteration, converges to the optimal solution:
   819   820   821   822   823   824   825   826   827   828   829