Page 204 - Mechatronics for Safety, Security and Dependability in a New Era
P. 204
Ch39-I044963.fm Page 188 Tuesday, August 1, 2006 3:15 PM
Ch39-I044963.fm
188
188 Page 188 Tuesday, August 1, 2006 3:15 PM
Two groups of simulations were performed: with and without model-generated experience. In each
group, there were two simulations: with the goal on the right and on the left. After training for 300
sequences of real experience, a test consisting of 10 trials, 50 sequences each, was performed. The
test results are summarized in Figure 2.
50
s
e
c 40
n
e
u
q 30 left seq.
e
s
f right seq.
o 20
r
e
b
m 10
u
n
0
right
right t
left
left
goal: rightri left lef left
goal:
experience: real real and model-generated
experience:
real
real and model-generated
Figure 2. Plot of the average number and standard deviation of left and right sequences over the 10 test
trials. The horizontal axis shows the settings for the four simulations.
DISCUSSION
From the simulation results in Figure 2, it can be seen that the neural network learned to achieve the
goal state. Also, the neural network trained with additional model-generated experience performs
better than the one trained only with real experience. These results were obtained using only the
reward as a teaching signal (using supervised learning as in the original PBWM model leads to better
results but is not suitable for experiments with planning). Another result is evident from the obtained
activation patterns in the PFC layer. The neural network shown in Figure lc, has been trained to
achieve the goal state on the right side. As can be seen, mostly active are the units in the top row of
the PFC stripes. They correspond to the units for move-right in the Hidden layer and consequently,
bias the neural network output to prefer this action in each state. Thus, the contents of the PFC layer
can be interpreted as a simple plan (a combination of actions) leading to the goal state. The future
work is directed toward using distributed representations in the network and more complex tasks.
REFERENCES
Calabretta R., Nolfi S., Parisi D., and Wagner G. (1998). Emergence of functional modularity in robots.
In From Animals to Animats 5, Edited by Blumberg B., Meyer J.A., Pfeifer R., and Wilson S.W., MIT
Press, Cambridge, pp 497-504.
Cohen J.D., Dunbar K., and McClelland J.L. (1990). On the control of automatic processes: A parallel-
distributed processing account of the stroop effect. Psychological Review, 97:3, 332-361.
O'Reilly R.C. and Frank M.J. (2004). Making working memory work: A computational model of
learning in the prefrontal cortex and basal ganglia. Technical Report 03-03 (Revised-Version Aug. 2,
2004). University of Colorado Institute of Cognitive Science.
O'Reilly R.C. and Munakata Yuko. (2000). Computational explorations in cognitive neuroscience:
Understanding the mind by simulating the brain, MIT Press, Cambridge.
Sutton R.S. and Barto A.G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge.
Ziemke Tom. (2000). On 'parts' and 'wholes' of adaptive behavior: Functional modularity and
diachronic structure in recurrent neural robot controllers. In From Animals to Animats 6 - Proceedings
of the Sixth International Conference on the Simulation of Adaptive Behavior. MIT Press, Cambridge.