Page 253 - Mechatronics for Safety, Security and Dependability in a New Era
P. 253
Ch48-I044963.fm Page 237 Tuesday, August 1, 2006 4:04 PM
1, 2006
Page 237
Tuesday, August
4:04 PM
Ch48-I044963.fm
237
237
gain of s with respect to action a.
The robot, which generates one feature extractor for a given task, obviously needs multiple feature
extractors for more complex tasks. It is unnecessary to learn a feature extractor for every given task.
The generated feature extractor must be generalized to make the robot more adaptable.
In this method, the robot reuses a number of generated feature extractors from past experiences and
selects effective ones for action decision. The system is shown in Figure l(c). The robot is given a
number of different feature extractors, but must select those which are appropriate for the given task.
The robot, therefore, learns the state mapping matrix using the supervised data and evaluates which
feature extractor is appropriate from the distribution of supervised data. If the robot uses all of the
supervised data in the evaluation, optimality in a local part of the task is lost. To evaluate the
effectiveness in the local task, the robot estimates which local task it is performing from the history of
observations and selects the feature extractor using a portion of the supervised data corresponding to
the local task.
SELECTIVE ATTENTION MECHANISM BASED ON GENERATED IMAGE FEATURE
EXTRACTORS
The System Overview
The robot is given n different feature extractors (F l,i = \,...,ri) and calculates the substate s ; sili"
using the mapping matrix W t corresponding to F i. Each mapping matrix is learned by maximizing the
information gain of s E (direct product of s,,..., s n) with respect to the supervised action a & A.
The robot selects the feature extractor which has a maximum expected information gain and decides
the appropriate action for the substate calculated using the selected feature extractor. It cannot always
decide the appropriate action using one feature extractor. It, therefore, estimates the reliability of
selected feature extractors and selects repeatedly until the reliability exceeds a given threshold.
For evaluation in the local task, the supervised data is segmented by temporal order. The robot selects
a sub-supervised data according to the history of observation and selects feature extractors to decide
an action using the selected one.
State learning
First, the robot collects supervised successful instances of the given task for N L episodes. An episode
ends when the robot accomplishes the task. An instance u consists of an observed image /" and a
given action a". Next, the robot learns the mapping matrices. The state s" E consists of substates s"
which are calculated from /" using F i and W i (the superscript denotes the corresponding instance).
The evaluation function used to learn W t is to maximize the information gain of s F with respect to a.
It is equivalent to minimizing the following risk function R (see Vlassis, Bunschoten, and Krose
(2001)).
In Eqn. 1 U denotes a set of all instances and N denotes the number of instances. The probability