Page 180 -
P. 180

3.7 Global optimization                                                                159


                  The use of Bayesian modeling has several potential advantages over regularization (see
               also Appendix B). The ability to model measurement processes statistically enables us to
               extract the maximum information possible from each measurement, rather than just guessing
               what weighting to give the data. Similarly, the parameters of the prior distribution can often
               be learned by observing samples from the class we are modeling (Roth and Black 2007a;
               Tappen 2007; Li and Huttenlocher 2008). Furthermore, because our model is probabilistic,
               it is possible to estimate (in principle) complete probability distributions over the unknowns
               being recovered and, in particular, to model the uncertainty in the solution, which can be
               useful in latter processing stages. Finally, Markov random field models can be defined over
               discrete variables, such as image labels (where the variables have no proper ordering), for
               which regularization does not apply.
                  Recall from (3.68) in Section 3.4.3 (or see Appendix B.4) that, according to Bayes’ Rule,
               the posterior distribution for a given set of measurements y, p(y|x), combined with a prior
               p(x) over the unknowns x,isgivenby

                                                   p(y|x)p(x)
                                           p(x|y)=           ,                     (3.106)
                                                      p(y)

               where p(y)=    p(y|x)p(x) is a normalizing constant used to make the p(x|y) distribution
                            x
               proper (integrate to 1). Taking the negative logarithm of both sides of (3.106), we get
                                  − log p(x|y)= − log p(y|x) − log p(x)+ C,        (3.107)

               which is the negative posterior log likelihood.
                  To find the most likely (maximum a posteriori or MAP) solution x given some measure-
               ments y, we simply minimize this negative log likelihood, which can also be thought of as an
               energy,
                                        E(x, y)= E d (x, y)+ E p (x).              (3.108)
               (We drop the constant C because its value does not matter during energy minimization.) The
               first term E d (x, y) is the data energy or data penalty; it measures the negative log likelihood
               that the data were observed given the unknown state x. The second term E p (x) is the prior
               energy; it plays a role analogous to the smoothness energy in regularization. Note that the
               MAP estimate may not always be desirable, since it selects the “peak” in the posterior dis-
               tribution rather than some more stable statistic—see the discussion in Appendix B.2 and by
               Levin, Weiss, Durand et al. (2009).
                  For image processing applications, the unknowns x are the set of output pixels
                                      x =[f(0, 0) ...f(m − 1,n − 1)],

               and the data are (in the simplest case) the input pixels
                                       y =[d(0, 0) ...d(m − 1,n − 1)]

               as shown in Figure 3.56.
                  For a Markov random field, the probability p(x) is a Gibbs or Boltzmann distribution,
               whose negative log likelihood (according to the Hammersley–Clifford theorem) can be writ-
               ten as a sum of pairwise interaction potentials,

                                 E p (x)=           V i,j,k,l (f(i, j),f(k, l)),   (3.109)
                                         {(i,j),(k,l)}∈N
   175   176   177   178   179   180   181   182   183   184   185