Page 85 - Artificial Intelligence for the Internet of Everything
P. 85
Active Inference in Multiagent Systems 71
surprise and divergence, obtaining that free energy is an upper bound on
surprise:
Þkpsj o, mÞ lnpoj mÞ,
ð
ð
Fo, bÞ ¼ lnpoj mÞ + D KL qsj bð½ ð ð
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} divergence
surprise |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
where D KL [q(sjb)k p(sjo,m)] is the Kullback-Leibler divergence between
the recognition density q(sjb) and the true posterior of the world states
p(sjo,m)¼p(sjo). Consequently, the minimization of free energy achieves
the approximate minimization of surprise, at which point the perceptions
q(sjb) are equal to the posterior density p(sjo,m).
Second, we can rewrite the free energy as the difference between com-
plexity and accuracy:
ð
ð
Fo, bÞ ¼ D KL qsj bð½ Þkpsj mÞ E q lnpoaðÞj s, mÞ
½
ð
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
complexity |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
accuracy
Here, D KL [q(sjb)k p(sjm)] is a measure of divergence between the rec-
ognition density q(sjb) and prior beliefs about the world p(sjm), interpretable
as a measure of complexity; the second component is the expectation about
the observations o to be received after performing an action a, which rep-
resents accuracy. This result means that the agent modifies its sensory out-
puts o¼o(a) through action a to achieve the most accurate explanation of
data under fixed complexity costs. Accordingly, we can now define the free
energy minimization using two sequential phases that separate estimation
and control:
• Perception phase finds beliefs b ∗ ¼ arg min b Fo, bð Þ; and
• Control phase finds actions a ∗ ¼ arg min a Fo aðÞ, b ∗ Þ.
ð
The control phase produces a policy for an agent to generate observations
that entail, on average, the smallest free energy. This result ensures that
the individual actions produced over time are not deterministic, and that
∗
the control phase can be converted into a sampling process a Q(a,b,o)
as a function of exploration bonus plus expected utility (Friston et al.,
2013) or average free energy (Friston, Samothrakis, & Montague, 2012).
Further, the free energy is dependent on the agent’s model m, which can
be adapted to minimize its free energy via an evolutionary or neuro-
developmental optimization. This process is distinct from perception; it
entails changing the form and architecture of the agent (Friston, Thornton,
& Clark, 2012). This change means that the free energy function can be used
to compare two or more agents (models) to each other (a better agent is the