Page 180 -
P. 180
3.7 Global optimization 159
The use of Bayesian modeling has several potential advantages over regularization (see
also Appendix B). The ability to model measurement processes statistically enables us to
extract the maximum information possible from each measurement, rather than just guessing
what weighting to give the data. Similarly, the parameters of the prior distribution can often
be learned by observing samples from the class we are modeling (Roth and Black 2007a;
Tappen 2007; Li and Huttenlocher 2008). Furthermore, because our model is probabilistic,
it is possible to estimate (in principle) complete probability distributions over the unknowns
being recovered and, in particular, to model the uncertainty in the solution, which can be
useful in latter processing stages. Finally, Markov random field models can be defined over
discrete variables, such as image labels (where the variables have no proper ordering), for
which regularization does not apply.
Recall from (3.68) in Section 3.4.3 (or see Appendix B.4) that, according to Bayes’ Rule,
the posterior distribution for a given set of measurements y, p(y|x), combined with a prior
p(x) over the unknowns x,isgivenby
p(y|x)p(x)
p(x|y)= , (3.106)
p(y)
where p(y)= p(y|x)p(x) is a normalizing constant used to make the p(x|y) distribution
x
proper (integrate to 1). Taking the negative logarithm of both sides of (3.106), we get
− log p(x|y)= − log p(y|x) − log p(x)+ C, (3.107)
which is the negative posterior log likelihood.
To find the most likely (maximum a posteriori or MAP) solution x given some measure-
ments y, we simply minimize this negative log likelihood, which can also be thought of as an
energy,
E(x, y)= E d (x, y)+ E p (x). (3.108)
(We drop the constant C because its value does not matter during energy minimization.) The
first term E d (x, y) is the data energy or data penalty; it measures the negative log likelihood
that the data were observed given the unknown state x. The second term E p (x) is the prior
energy; it plays a role analogous to the smoothness energy in regularization. Note that the
MAP estimate may not always be desirable, since it selects the “peak” in the posterior dis-
tribution rather than some more stable statistic—see the discussion in Appendix B.2 and by
Levin, Weiss, Durand et al. (2009).
For image processing applications, the unknowns x are the set of output pixels
x =[f(0, 0) ...f(m − 1,n − 1)],
and the data are (in the simplest case) the input pixels
y =[d(0, 0) ...d(m − 1,n − 1)]
as shown in Figure 3.56.
For a Markov random field, the probability p(x) is a Gibbs or Boltzmann distribution,
whose negative log likelihood (according to the Hammersley–Clifford theorem) can be writ-
ten as a sum of pairwise interaction potentials,
E p (x)= V i,j,k,l (f(i, j),f(k, l)), (3.109)
{(i,j),(k,l)}∈N