Page 216 -

P. 216

Section 6.4 Image Denoising 184

T
wavelet transform (Mallat 1999), and the denoised signal is x ε = T α ε . In this
case, a method for selecting ε for a given noise level is available, along with theo-
retical guarantees about the reconstructed signal.
By construction, the vector x ε usually admits a sparse decomposition on
the basis of R m formed by the columns of T T = T −1 —that is, only a few of the
i
coeﬃcients α are nonzero. As further discussed in Chapter 22, sparse linear models
ε
are well suited to natural images, and Dabov et al. (2007) combine sparsity-inducing
shrinkage with the use of self-similarities. They stack similar image patches into
three-dimensional arrays (groups), then use shrinkage on the groups, coupled with
transformations such as the (three-dimensional) discrete cosine transform,or DCT.
Because the patches are similar, the decomposition of each group is expected to be
quite sparse, and denoised patches x ε are retrieved from the shrunken groups. The
ﬁnal value of a pixel is taken to be the average of the values x ε at this point for all
patches passing through it. In conjunction with a few simple heuristics, this simple
idea has proven to be very eﬀective, and it typically gives better results than regular
non-local means.

6.4.3 Learned Sparse Coding
An alternative is to assume that the clean signal can be approximated by a sparse
linear combination of elements from a (potentially large) set of vectors forming the
k columns of a so-called dictionary, which may be overcomplete (k> m). Under
this assumption, denoising a patch y in R m with a dictionary D in R m×k composed
of k elements amounts to solving the sparse decomposition problem

2
min ||α|| 1 s.t. ||y −Dα|| ≤ ε, (6.2)
2
α∈R k
where Dα is an estimate of the clean signal, and ||α|| 1 is the sparsity-inducing
1 norm formed by the sum of the absolute values of the coeﬃcients of α.As
shown in Chapter 22, the 1 regularization in Equation (6.2) leads to the convex
Lasso (Tibshirani 1996) and basis pursuit (Chen et al. 1999) problems, for which
eﬃcient algorithms are available. As shown in Elad and Aharon (2006), ε can be
chosen according to the standard deviation of the noise.
Various types of wavelets have been used as dictionaries for natural images.
Elad and Aharon (2006) have proposed instead to learn a dictionary D adapted
to the image at hand, and demonstrated that learned dictionaries lead to better
empirical performance than oﬀ-the-shelf ones. For an image of size n, a dictionary
in R m×k adapted to the n overlapping patches of size m (typically m =8 × 8 n)
associated with the image pixels, is learned by addressing the following optimization
problem:
n

2
min ||α i || 1 s.t. ||y −Dα i || ≤ ε, (6.3)
i
2
D∈C,A
i=1
where C is the set of matrices in R m×k with unit 2 -norm columns, A =[α 1 ,... , α n ]
is a matrix in R k×n , y is the i-th patch of the noisy image y, α i is the correspond-
i
ing code, and Dα i is the estimate of the denoised patch. Note that this procedure
implicitly assumes that the patches are independent from each other, which is ques-
tionable because they overlap. However, this approximation makes the correspond-

211 212 213 214 215 216 217 218 219 220 221