Page 204 - Mechatronics for Safety, Security and Dependability in a New Era
P. 204

Ch39-I044963.fm  Page 188  Tuesday, August 1, 2006  3:15 PM
            Ch39-I044963.fm
               188
               188    Page 188  Tuesday, August  1, 2006  3:15 PM
               Two  groups of simulations  were performed:  with and without  model-generated  experience.  In each
               group, there  were two simulations: with the goal on the right and on the left.  After  training for 300
               sequences of real  experience, a test  consisting  of 10 trials, 50 sequences  each, was performed.  The
               test results are summarized in Figure 2.
                                     50
                                   s
                                   e
                                   c  40
                                   n
                                   e
                                   u
                                   q  30                            left seq.
                                   e
                                   s

                                   f                                right seq.
                                   o  20

                                   r
                                   e
                                   b
                                   m  10
                                   u
                                   n
                                     0
                                           right
                                                         right t
                                                left
                                                              left
                                      goal:  rightri  left  lef  left
                                      goal:
                                   experience:  real  real and model-generated
                                   experience:
                                             real
                                                      real and model-generated
               Figure 2. Plot of the average number and standard deviation of left  and right sequences over the  10 test
                          trials.  The horizontal axis shows the settings for the four  simulations.
               DISCUSSION
               From the simulation  results  in Figure 2, it can be seen that the neural network  learned to achieve the
               goal  state.  Also, the neural  network  trained  with  additional  model-generated  experience  performs
               better  than  the one trained  only  with  real  experience.  These  results  were  obtained  using  only the
               reward as a teaching  signal  (using  supervised  learning as in the original PBWM  model  leads to better
               results but is not suitable for experiments with planning).  Another result is evident from the obtained
               activation  patterns  in the PFC layer.  The neural  network  shown  in Figure  lc, has been  trained to
               achieve the goal  state on the right  side.  As can be seen, mostly  active are the units in the top row of
               the  PFC stripes.  They  correspond to the units for move-right  in the Hidden layer  and consequently,
               bias the neural network  output to prefer  this action  in each  state.  Thus, the contents of the PFC  layer
               can be interpreted  as a simple  plan  (a combination  of actions)  leading to the goal  state.  The future
               work is directed toward using distributed representations in the network and more complex tasks.
               REFERENCES
               Calabretta R., Nolfi  S., Parisi D., and Wagner G. (1998). Emergence of functional  modularity in robots.
               In From Animals to Animats 5, Edited by Blumberg B., Meyer J.A., Pfeifer  R., and Wilson  S.W.,  MIT
               Press, Cambridge, pp 497-504.
               Cohen J.D., Dunbar K., and McClelland J.L. (1990). On the control of automatic processes: A parallel-
               distributed  processing account of the stroop effect.  Psychological Review, 97:3, 332-361.
               O'Reilly  R.C. and Frank  M.J. (2004).  Making  working  memory  work:  A computational  model of
               learning  in the prefrontal  cortex and basal  ganglia.  Technical  Report  03-03  (Revised-Version  Aug. 2,
               2004). University of Colorado Institute of Cognitive Science.
               O'Reilly  R.C. and Munakata  Yuko.  (2000).  Computational  explorations  in cognitive  neuroscience:
               Understanding the mind by simulating the brain, MIT Press, Cambridge.
               Sutton R.S. and Barto A.G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge.
               Ziemke  Tom. (2000).  On  'parts'  and  'wholes'  of  adaptive  behavior:  Functional  modularity  and
               diachronic structure in recurrent neural robot controllers. In From Animals to Animats 6 -  Proceedings
               of the Sixth International  Conference on the Simulation of Adaptive Behavior. MIT Press, Cambridge.
   199   200   201   202   203   204   205   206   207   208   209