Page 217 -
P. 217

Section 6.4  Image Denoising  185













                            FIGURE 6.16: Sparsity vs. joint sparsity: Gray squares represents nonzero values in vectors
                            (left)or matrix (right). Reprinted from “Non-local Sparse Models for Image Restoration,”
                            by J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Proc. International
                            Conference on Computer Vision, (2009). c   2009, IEEE.


                            ing optimization tractable. Indeed, although dictionary learning is traditionally
                            considered very costly, online algorithms such as the procedure described in Chap-
                            ter 22 and Mairal, Bach, Ponce, and Sapiro (2010) make it possible to efficiently
                            process millions of patches, allowing the use of large photographs and/or large im-
                            age databases. In typical applications, the dictionary D is first learned on such a
                            database, then refined on the image of interest itself using the same process.
                                 Once the dictionary D and codes α i have been learned, every pixel admits m
                            estimates (one per patch containing it), and its value can be computed by averaging
                            these values.
                                 Let us close this section by showing that self-similarities also can be exploited
                            in this framework. Concretely, a joint sparsity pattern—that is, a common set
                            of nonzero coefficients—can be imposed to a set of vectors α 1 ,..., α l through a
                            grouped-sparsity regularizer on the matrix A =[α 1 ,..., α l ]in R k×l  (Figure 6.16).
                            This amounts to limiting the number of nonzero rows of A, or replacing the   1
                            vector norm in Equation (6.3) by the   1,2 matrix norm
                                                                k

                                                                     i
                                                       ||A|| 1,2 =  ||α || 2 ,                 (6.4)
                                                                i=1
                                   i
                            where α denotes the i-th row of A.
                                 Similar to the BM3D groups, we can define for each y the set S i of patches
                                                                                i
                            similar to it, using for example a threshold on the inter-patch Euclidean distance.
                            The dictionary learning problem can now be written as
                                                  n
                                                     ||A i || 1,2
                                                                                  2
                                           min               s.t. ∀i   ||y −Dα ij || ≤ ε i ,   (6.5)
                                                                         j
                                                                                  2
                                        (A i ) n i=1 ,D∈C  i=1  |S i |  j∈S i
                                                ∈ R k×|S i | , and an appropriate value of ε i can be chosen
                            where A i =[α ij ] j∈§ i
                            as before. The normalization by |S i | gives equal weights for all groups. For a
                            fixed dictionary, simultaneous sparse coding is convex and can be solved efficiently
                            (Friedman 2001; Bach, Jenatton, Mairal, & Obozinski 2011). In turn, the dictionary
                            can be learned using a simple and efficient modification of the algorithm presented
                            in Chapter 22 and Mairal et al. (2010), and the final image is estimated by averaging
                            the estimates of each pixel. In practice, this method gives better results than plain
                            dictionary learning.
   212   213   214   215   216   217   218   219   220   221   222