Page 202 - Mechatronics for Safety, Security and Dependability in a New Era
P. 202

Ch39-I044963.fm  Page 186  Tuesday, August 1, 2006  3:15 PM
            Ch39-I044963.fm
               186
               186    Page 186  Tuesday, August  1, 2006  3:15 PM
               RELATED  WORKS

               While simple mobile robot behaviors  can be learned with  feed-forward  neural networks,  combinations
               of  behaviors,  where  sometimes  identical  sensory  inputs  should  trigger  different  actions,  require
               additional  coordinating  mechanisms.  For  example,  in  Calabretta,  Nolfi,  Parisi,  &  Wagner  (1998)  a
               Khepera  robot  is trained  to  perform  a garbage  collecting  task  and  the  authors  find  a  correspondence
               between  specific  behaviors  and  the  evolved  neural  network  modules.  The  interaction  among  these
               modules is controlled by selector neurons that give precedence of a given module over the others.
               In  contrast  to  the  above  work,  where  the  modules  are  physically  separate  entities,  Ziemke  (2000)
               interprets  the trained Recurrent  Neural  Network  (RNN)  as  a diachvonically  structured  controller.  In
               this  case,  instead  of  modules  existing  separately  at  the  same  time,  a  monolithic  neural  network
               instantiates  different  input-output  mappings  at  various  time  points.  An  important  aspect  of  the
               mechanism  by which RNN achieve modularity  is discussed  in Cohen, Dunbar, & McClellandl  (1990),
               where the switching between two  input-output mappings  is achieved  by attentional  control  (attention  is
               viewed  as "an additional  source  of input that provides  contextual  support  for the processing  of  signals
               within  a selected  pathway"  (p. 335)).  In RNN, the  source that  provides  contextual  support  favoring
               one of the competing input-output mappings  is the  context layer.  The state maintained  in the  context
               layer disambiguates the inputs and thus different  outputs can be obtained  for  similar inputs.

               Since, in RNN, the internal  state  plays a central role  in switching between the alternative  input-output
               mappings, the  flexibility  of  updating  and  maintaining  this  internal  state  affects  directly  the  flexibility
               of  the  resulting  robot  behaviors  implemented  by  the  network.  The  potential  of  the  computational
               model  of working memory  based  on the PFC and basal ganglia (PBWM model), proposed  in  O'Reilly
               &  Frank  (2004),  to  provide  such  flexibility  motivated  us  to  investigate  its  application  to  learning
               combinations of robot behaviors.


               APPROACH

               In  the  presented  approach,  the  PBWM  model  is  used  to  implement  several  possible  input  output
               mappings  and  then  to  learn  specific  combinations.  Also,  a  model  of  the  environment  is  added  to
               provide  model-generated  experience.  We  are  interested  in  two  consequences  of  using  an
               environment  model: lowering the  costs  associated with  actually  performing  the actions  and  extending
               the neural network  model to a planning  system  supporting grounded  representations.

               Working Memory  Model

               Here  we present  an outline  of the PBWM model  (refer  to O'Reilly  & Frank  (2004)  for  details).  The
               model  implementation  is  based  on  the  Leabra  framework  (O'Reilly  & Munakata,  2000),  uses  point
               neuron  activation  function  for  modelling  the  neurons,  k-Winners-Take-All  inhibition  to  model
               competition among the neurons in a layer, and a combination of Hebbian and error-driven learning.

               The  neural  network  structure  (Figure  lc)  consists  of two  groups  of  layers.  The  first  group  includes
               the  Input,  Hidden,  Output,  Nextlnput,  and  PFC  layers.  The  Nextlnput  layer  is  used  for  the
               environment  model  and  will  be  explained  later.  The  Input,  Hidden,  and  Output  layers  form  a
               standard three-layer  neural network  structure.  The PFC  layer  is an improved  context  layer, which  is
               bi-directionally  connected  with  the  Hidden  layer,  and  influences  the  input-output  pathways.  The
               PFC  layer  is divided  into  stripes  to  allow  independent  control  over  the updating  and  maintenance  of
               parts of the activation  state.  The rest of the layers form  the  second group, which  implements a gating
               mechanism  for  control  over  the updating  and maintenance  of the PFC  activation  state.  Generally,  a
               positive reward leads to stabilizing of the current PFC activation  state, while a negative reward results
               in updating (a part of it) and establishing  of another state.
   197   198   199   200   201   202   203   204   205   206   207