Page 201 - Mechatronics for Safety, Security and Dependability in a New Era

P. 201

Ch39-I044963.fm Page 185 Tuesday, August 1, 2006 3:15 PM
1, 2006
3:15 PM
Page 185
Ch39-I044963.fm
Tuesday, August
185
185
COMPUTATIONAL MODEL AND
ALGORITHM OF HUMAN PLANNING
H. Fujimoto, B. 1. Vladimirov, and H. Mochiyama
Robotics and Automation Laboratory, Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan

ABSTRACT
In this paper, we investigate an application of a working memory model to learning robot behaviors.
We implement an extension that allows learning from model-based experience to reduce the costs
associated with learning the desired robot behaviors and to provide a base for exploring neural network
based human-like planning with grounded representations. A simulation of applying the approach to
a random walk task was performed and a basic plan was obtained in the working memory.

KEYWORDS
Human mimetics, Human behavior, Mobile robot, Planning

INTRODUCTION
Using neural networks, it is relatively easy to learn separately simple mobile robot behaviors like
approaching, wall following, etc., and with appropriate network architectures, combinations of such
behaviors can be learned too. However, since these combinations are encoded into the network
weights, switching from one combination to another often requires retraining. An interesting
approach addressing the problem of switching among different mappings is presented in a working
memory model proposed recently in O'Reilly & Frank (2004). It comes from the field of
computational neuroscience and is a computational model of the working memory based on the
prefrontal cortex (PFC) and basal ganglia. An important aspect of applying this model to learn a
combination of behaviors is that the information for that combination is maintained explicitly as
activation patterns in the PFC. Compared to a weights based encoding, these activation patterns can
be updated faster and thus switching among possible combinations becomes easier.

In this paper, an implementation of that working memory model is applied to a five-state random walk
task. Furthermore, an environment model is added to provide model-based learning, motivated by
the fact that reinforcement learning based only on real experience is associated with high costs (in
terms of time, energy, etc.) when applied to real robots. Using additional model-generated
experience helps to decrease the associated costs and also provides a link to planning, since, as argued
in Sutton & Barto (1998), planning can also be interpreted as learning from simulated experience. In
light of this interpretation, the information (about the learned specific combination of behaviors)
maintained in the working memory can be viewed as a simple plan to achieve the rewarded goal state.

196 197 198 199 200 201 202 203 204 205 206