Page 253 - Mechatronics for Safety, Security and Dependability in a New Era
P. 253

Ch48-I044963.fm  Page 237  Tuesday, August 1, 2006  4:04 PM
                                      1, 2006
                      Page 237
                            Tuesday, August
                                           4:04 PM
            Ch48-I044963.fm
                                                                                          237
                                                                                          237
                  gain of s with respect to action a.
                  The  robot,  which  generates  one  feature  extractor  for  a  given  task,  obviously  needs  multiple  feature
                  extractors  for  more  complex  tasks. It  is unnecessary  to learn a feature  extractor  for  every  given task.
                  The generated  feature  extractor must be generalized to make the robot more adaptable.
                  In this  method,  the  robot  reuses  a number  of  generated  feature  extractors  from  past  experiences  and
                  selects  effective  ones  for  action  decision.  The  system  is  shown  in Figure  l(c).  The  robot  is  given  a
                  number  of different  feature  extractors, but must  select those  which  are appropriate  for the given task.
                  The  robot,  therefore,  learns  the  state  mapping  matrix  using  the  supervised  data  and  evaluates  which
                  feature  extractor  is  appropriate  from  the  distribution  of  supervised  data.  If  the  robot  uses  all  of  the
                  supervised  data  in  the  evaluation,  optimality  in  a  local  part  of  the  task  is  lost.  To  evaluate  the
                  effectiveness  in the  local task, the robot estimates which  local  task it is performing  from  the history of
                  observations  and  selects the  feature  extractor  using  a portion  of the  supervised  data  corresponding  to
                  the local task.


                  SELECTIVE ATTENTION MECHANISM BASED ON GENERATED IMAGE FEATURE
                  EXTRACTORS
                  The System Overview

                  The  robot  is  given  n  different  feature  extractors  (F l,i  = \,...,ri)  and  calculates  the  substate  s ; sili"
                  using the mapping matrix  W t  corresponding to  F i.  Each mapping matrix  is learned  by maximizing the
                  information  gain of  s E  (direct product of  s,,..., s n)  with respect to the supervised  action  a & A.


                  The  robot  selects the  feature  extractor  which  has a maximum  expected  information  gain  and  decides
                  the appropriate  action  for the substate calculated using the  selected  feature  extractor. It cannot  always
                  decide  the  appropriate  action  using  one  feature  extractor.  It,  therefore,  estimates  the  reliability  of
                  selected  feature  extractors and  selects repeatedly until the reliability exceeds a given threshold.

                  For evaluation  in the local task, the supervised  data is segmented  by temporal  order. The robot  selects
                  a  sub-supervised  data  according  to the  history  of observation  and  selects  feature  extractors  to  decide
                  an action using the selected one.
                  State learning

                  First, the robot collects supervised  successful  instances of the given task  for  N L  episodes. An  episode
                  ends  when  the  robot  accomplishes  the  task.  An  instance  u  consists  of  an  observed  image /"  and  a
                  given  action  a".  Next,  the  robot  learns  the  mapping  matrices.  The  state  s" E consists  of  substates s"
                  which  are  calculated  from  /"  using  F i  and  W i  (the  superscript  denotes  the  corresponding  instance).
                  The evaluation  function  used to learn  W t is to maximize the  information  gain of  s F  with respect to a.
                  It  is  equivalent  to  minimizing  the  following  risk  function  R  (see  Vlassis,  Bunschoten,  and  Krose
                  (2001)).





                  In  Eqn.  1  U denotes  a  set  of  all  instances  and  N  denotes  the  number  of  instances.  The  probability
   248   249   250   251   252   253   254   255   256   257   258