Page 85 - Artificial Intelligence for the Internet of Everything
P. 85

Active Inference in Multiagent Systems  71



              surprise and divergence, obtaining that free energy is an upper bound on
              surprise:
                                                 Þkpsj o, mފ    lnpoj mÞ,
                    ð
                                ð
                   Fo, bÞ ¼   lnpoj mÞ + D KL qsj bð½  ð           ð
                           |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}  divergence
                              surprise  |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
              where D KL [q(sjb)k p(sjo,m)] is the Kullback-Leibler divergence between
              the recognition density q(sjb) and the true posterior of the world states
              p(sjo,m)¼p(sjo). Consequently, the minimization of free energy achieves
              the approximate minimization of surprise, at which point the perceptions
              q(sjb) are equal to the posterior density p(sjo,m).
                 Second, we can rewrite the free energy as the difference between com-
              plexity and accuracy:

                                                          ð
                                            ð
                        Fo, bÞ ¼ D KL qsj bð½  Þkpsj mފ  E q lnpoaðÞj s, mފ
                                                      ½
                         ð
                                |fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
                                     complexity     |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
                                                         accuracy
                 Here, D KL [q(sjb)k p(sjm)] is a measure of divergence between the rec-
              ognition density q(sjb) and prior beliefs about the world p(sjm), interpretable
              as a measure of complexity; the second component is the expectation about
              the observations o to be received after performing an action a, which rep-
              resents accuracy. This result means that the agent modifies its sensory out-
              puts o¼o(a) through action a to achieve the most accurate explanation of
              data under fixed complexity costs. Accordingly, we can now define the free
              energy minimization using two sequential phases that separate estimation
              and control:
              •  Perception phase finds beliefs b ∗ ¼ arg min b Fo, bð  Þ; and
              •  Control phase finds actions a ∗ ¼ arg min a Fo aðÞ, b ∗ Þ.
                                                     ð
              The control phase produces a policy for an agent to generate observations
              that entail, on average, the smallest free energy. This result ensures that
              the individual actions produced over time are not deterministic, and that
                                                                    ∗
              the control phase can be converted into a sampling process a  Q(a,b,o)
              as a function of exploration bonus plus expected utility (Friston et al.,
              2013) or average free energy (Friston, Samothrakis, & Montague, 2012).
              Further, the free energy is dependent on the agent’s model m, which can
              be adapted to minimize its free energy via an evolutionary or neuro-
              developmental optimization. This process is distinct from perception; it
              entails changing the form and architecture of the agent (Friston, Thornton,
              & Clark, 2012). This change means that the free energy function can be used
              to compare two or more agents (models) to each other (a better agent is the
   80   81   82   83   84   85   86   87   88   89   90