Page 401 - Mechatronics for Safety, Security and Dependability in a New Era
P. 401
Ch78-I044963.fm Page 385 Monday, August 7, 2006 11:30 AM
Monday, August 7,2006
Page 385
11:30 AM
Ch78-I044963.fm
385
385
AN EFFECTIVE STATE-SPACE CONSTRUCTION METHOD
FOR REINFORCEMENT LEARNING OF MULTI-LINK
MOBILE ROBOTS
1
:
M. Nunobiki K. Okuda and S. Maeda 2
1
Department of Mechanical and System Engineering, University of Hyogo,
2167 Shosha, Himeji, Hyogo, JAPAN
Department of Quality Assurance, Shin Caterpillar Mitsubishi LTD.
1106-4 Shimizu Uozumi, Akashi, Hyogo, JAPAN
ABSTRACT
One of the problems in reinforcement learning with real robots is to need a large number of trials. This
paper proposes a reinforcement learning that uses fuzzy ART for segmentation of state-space.
Whenever fuzzy ART encounters a new situation, it generates a new category node to the state-space.
We proposed generating methods of new category nodes that inherit the state-value and the policy
from a similar node. Proposed methods were estimated from simulations of a two-link manipulator
and a multi-link mobile robot. It was confirmed that the proposed method was able to increase the
learning speed and reduce the size of state-space.
KEYWORDS
Reinforcement learning, Actor-critic, State-space construction, Fuzzy ART, Inheritance of state-value,
Manipulator, Multi-link mobile robot, Action acquisition
INTRODUCTION
We have developed inchworm-type mobile robots to search for life in collapsed buildings. While these
robots had neither legs nor wheels, they were able to advance by using vertically undulatory motion of
whole body (Takita et al.). These robots demonstrated high mobility (Nunobiki et al.). However, it was
difficult to generate suitable walking motions because it was difficult for human to understand the
motions of the multi-link robot intuitively. Therefore, reinforcement learning (Suttun and Barto.) is
expected for trajectory generation of these robots. This paper deals with actor-critic learning method
(Barto and Suttun.). Although the actor-critic methods require minimal computation in order to select
an action from a continuous-valued action, the performances are insufficient to apply to the real robots
yet (Morimoto and Doya.). A grid-like representation of the state-space was insufficient to the