Page 402 - Mechatronics for Safety, Security and Dependability in a New Era
P. 402

Ch78-I044963.fm  Page 386  Monday, August 7, 2006  11:30 AM
            Ch78-I044963.fm
               386
               386    Page 386  Monday, August 7,2006  11:30 AM
               applications  (Morimoto  and Doya.). This paper used  fuzzy  ART  for  sate-space  segmentation.  It is  an
               incremental  category-space  construction  method  (Carpenter  et  al.).  Whenever  the  fuzzy  ART
               encounters  a new  situation,  it  generates  a new  category  node  in the  category  space. Furthermore,  we
               proposed  generating  methods  of a new node that  inherit the  state-value  and  the policy  from  a  similar
               node to increase the learning speed more.


               REINFORCEMENT   LEARNING WITH FUZZY-ART    STATE-SPACE  CONSTRUCTION
               Figure  1 shows the architecture  of proposed  system. This system based on actor-critic methods. Using
               the  sensed  state  from  environment,  the  actor  selects  an  action.  The  critic  evaluates  the  new  state  to
               determine  whether  the  state  has  gone  better  or  worse  than  expected.  The  TD-error  represents  this
               evaluation. The TD-error is used for the improvement of the policy and the update of the state-value. If
               TD-error is positive, it suggests the tendency to select the action  should be strengthened  for the  future.
               If TD-error  is  negative,  it  suggests  the  tendency  should  be weakened.  We  applied  the  fuzzy  ART  to
               generalization  of  sensed  states. In many  tasks, most  states  will  never  have  been  experienced  exactly
               before.  Fuzzy ART  is a kind  of  self-organized  clustering  method  and  it classifies  unknown  state  into
               the group of approximate states. One state code used for each group.


               FUZZY-ART

               The  fuzzy  ART consists  of two  fully  connected  layers. The  category  layer  contains  category  nodes to
               categorize a given vector. A weight vector  Wj shows the representative pattern  of category node . The
                                                                                     j
               sensed  data  are  normalized  with  complement  cording.  The  sensed  data  vector  and  its  complement
               vector  are  input  to  the  input  layer.  The  choice  function  Ti  is  defined  as  equation  (1).  Where  c  is  a
               choice  parameter,  the  operator  is  defined  by  (p A q) =min(p,  q)  and the  norm  | x  | is  the  sum  of  its
               components.  The category node i that has the maximal  Ti is called  a winner node. The winner node is
               judged  to cause resonance or mismatch reset according to the equation (2). If the match function  Mi is
               bigger  than  the vigilance  parameter  p, resonance  is  carried  out  and  the  weight  vector  Wi  is  updated
               according  to  the  equation  (3).  Where  (3 is  a learning  rate parameter.  When  mismatch  reset  procedure
               should be done, other category node that has the next maximal  T is chosen  again. When any category
               node  is not  selected,  a new node  is generated.  In normally,  default  values  are  given  to  the policy  and
               the state-value. In proposed methods, inherited value from  a similar node was used. The efficiency  was
               estimated in the simulations of a two-link manipulator and a multi-link mobile robot.
                      Fuzzy ART
                             i l l  A'1'2  ATi  A'
                      Category Layer^  ^  ^  ^
                                                                 "Critic  H CTD-error  j
                                                                Value function V  |  '  I
                                                       X(t)        SA\     Aclion  @ )
                      Input Lai
                                                        jsensor  output  (jW) Reward  1
                                                                  Environment
                           Figure 1: Architecture of actor-critic learning method  with fuzzy  ART
                                         „,  \x  A W,\
                                            c+ | w,                          (1)
                                                                             (2)
                                                                             (3)
   397   398   399   400   401   402   403   404   405   406   407