Page 142 - Rapid Learning in Robotics
P. 142

128                                  “Mixture-of-Expertise” or “Investment Learning”


                          weights / parameters   j determined (see Fig. 9.2, arrows (1)). It serves to-
                          gether with the context information  c as a high-dimensional training data
                          vector for the META-BOX (2). During the investment learning phase the
                          META-BOX mapping is constructed, which can be viewed as the stage for
                          the collection of expertise in the suitably chosen prototypical contexts.



                          9.2.2 One-shot Adaptation Phase



                                        New
                                        Context                     (3)   ω
                                                      Meta-Box
                                                                         parameters
                                         c   (3)                         or  weights
                                                                                      X 2
                                              X                       T-Box
                                               1
                                                       (4)                         (4)


                                            Figure 9.3: The One-shot Adaptation Phase.



                          After the META-BOX has been trained, the task of adapting the “skill” to
                          a new system context is tremendously accelerated. Instead of any time-
                          consuming re-learning of the mapping T this adjustment now takes the
                          form of an immediate META-BOX   T-BOX mapping or “one-shot adapta-
                          tion”. As illustrated in Fig. 9.3, the META-BOX maps a new (unknown)
                                                c                                               for the
                          context observation   new  (3) into the parameter 
 weight set   new
                          T-BOX. Equipped with   new , the T-BOX provides the desired mapping
                               (4).
                          T new

                          9.2.3 “Mixture-of-Expertise” Architecture


                          It is interesting to compare this approach with a feed-forward architec-
                          ture which Jordan and Jacobs (1994) coined “mixture-of-experts”. As il-
                          lustrated in Fig. 9.4 a number of “experts” receive the same input task
                          variables together with the context information  c. In parallel, each ex-
                          pert produces an output and contributes – with an individual weight – to
                          the overall system result. All these weights are determined by the “gating
                          network”, based on the context information  c (see also LLM discussion in
                          Sec. 3.8).
   137   138   139   140   141   142   143   144   145   146   147