Page 819 - Mechanical Engineers' Handbook (Volume 2)
P. 819

810   Neural Networks in Feedback Control Systems

                                 T
                          where W   [w ] is a matrix of output representative values and the FL basis functions   ( )
                                      kj
                                                                                                j
                          play the role of NN activation functions. Using product inferencing, the basis functions are
                          given in terms of the one-dimensional membership functions (MFs)   (x,U )by
                                                                                 ij
                                                                                     ij
                                                              n
                                                                   (x ,U )
                                                              i 1  ij  i  ij
                                                    (x,U)
                                                   j
                                                            L    n
                                                                    ij
                                                                        ij
                                                                      i
                                                            j 1
                                                              i 1    (x ,U )
                          where U is a vector of parameters of the MFs including the centroids and spreads. The
                                ij
                          number of rules is L. The standard choice for the MFs is triangle functions. However, other
                          choices have been used, including splines (c.f. Ref. 4, CMAC NN), second- or third-degree
                          polynomials, or the RBF functions. 36
                             FL systems have the connotation of higher level supervisors since they are rule based.
                          The fuzzy-neural reinforcement learning scheme shown in Fig. 16 has been developed, where
                          a FL system serves as a critic and a NN serves as an action-generating network that controls
                          the system. The reinforcement controller is adaptive in the sense that the FL critic is tuned
                          as well as the NN action-generating network to improve system performance through online
                          learning. Stability and convergence proofs have been provided and depend on using certain
                          specialized tuning schemes for the FL critic membership functions and the NN weights.
                          Tuning the membership functions has the effect of modifying them so they converge onto
                          the region in with highest state trajectory activity, a form of dynamic focusing of awareness.
                             The advantage of the FL/NN adaptive reinforcement learning structure is that the critic
                          can be initialized using linguistic/heuristic notions by the human user. Finally, for FL sys-
                          tems one can look at the final MFs and interpret what information has been stored in the
                          system through learning.
           9  OPTIMAL CONTROL USING NNs
                          Heretofore we have discussed the design of NN controllers for tracking and stabilization
                          based on control theory techniques including feedback linearization, backstepping, singular
                          perturbations, force control, dynamic inversion, and observer design. The point was made


                                                                                 r
                                                                                 e
                                                                                si
                                                                              Desired d
                                                                              De
                                                      FL adaptive critic
                                                      FL
                                                                              trajectory
                                                                       ou
                                                                I Instantaneous s
                                                                 nstantane
                                                  ()
                                                 R(t)             utility
                                                                         Perf orma nce
                                                                         Performance
                                                                           aluator
                                                                          v
                                                                          e Evaluator
                                            Tuning                  r(t)()
                                                      ˆ ˆ
                                                       (x)
                                                      f f(x )      u(t) (         x(t) ()
                                                                         Unknown
                                                                         Unknown
                                                                           Plant
                                                                           plant
                                                                          d(t)()
                                          Action-generating NN
                                     Figure 16 Fuzzy logic adaptive reinforcement learning NN controller.
   814   815   816   817   818   819   820   821   822   823   824