Page 252 - Mechatronics for Safety, Security and Dependability in a New Era

P. 252

Ch48-I044963.fm Page 236 Tuesday, August 1, 2006 4:04 PM
Ch48-I044963.fm
236
236 Page 236 Tuesday, August 1, 2006 4:04 PM
Observed image Filtered image Reduced image State Action
State
Action
Observed imageFiltered image Reduced image
Io If Ic Ic s s
Mapping matrix
Mapping matrix
averaging W s1 Filtered Reduced Substates
Reduced
Filtered
Substates
s2 images images
images
images
filtering reduction . a
.
filter F .
sm
sigmoid function If1 Ic1 W1
nx x ny nx x ny ncx x ncy g(x) Observed s1
image F1
Io
image feature extraction
Action
state vector extraction
image feature extraction state vector extraction State Action
State
(a) Imagefeature generation model s1
(a) Image feature generation model
F2 If2 Ic2 W2 s2
s2 sE = . ~ a a
.
episode 1
end .
U2 sn
Ur episode NE . . .
. . .
. . .
Fn
Ifn Icn Wn
U1 sn
start State space
State space
(b) Segmentation of supervised data
(b) Segmentation of supervised data (c) Image feature selection model
(c) Image feature selection model
Figure 1: Image feature generation and selection models
reinforcement learning. Mitsunaga and Asada (2000) proposed a method to select a landmark
according to the information gain on action selection. In these methods, however, the image features
to detect the landmarks from the observed image are given a priori. Tt is desirable that the image
feature adapts to environmental changes.
This paper proposes a method in which a robot learns to select image feature extractors generated by
itself according to a task-relevant criterion. The generated feature extractors are not always suitable
for new tasks. The robot must learn to select them to accomplish the task. The criterion of selection is
the information gain calculated from given task instances (supervised data). Furthermore, a part of
supervised data which gives the local information of the task makes the selective mechanism more
effective. The method is applied to indoor navigation.
THE BASIC IDEA
In the proposed method, a robot generates an image feature extractor that is necessary for the action
selection through visuo-motor map learning (Minato & Asada, 2003). The state calculation process is
decomposed into feature extraction and state extraction (Figure l(a)). A robot learns the effective
feature extractor and state mapping matrix for a given task through a mapping from observed images
to supervised actions. During feature extraction, the interactions between raw data are limited to local
areas, while the connections between the filtered image and the state spread over the entire space to
represent non-local interactions. It is, therefore, expected that the feature extractors are more general
and could be generalized knowledge to accomplish a task of a certain class.
The robot calculates the filtered image I f from the observed image /, using the feature extractor F.
The state s e 91"' is calculated from a compressed image I c by the sum of weighted pixel values. The
robot decides the appropriate action for the current state s. The function model of the feature extractor
is given, and the robot learns its parameters and the mapping matrix W by maximizing the information

247 248 249 250 251 252 253 254 255 256 257