Page 215 -
P. 215
Section 6.4 Image Denoising 183
We have already seen that linear filters such as Gaussian kernels are effective
at suppressing noise, but that the price to pay is a loss in image detail. We briefly
discuss in this section three related approaches to image denoising that are much
more effective. They rely on two properties of natural images: the prominence of
self-similarities—that is, many small regions in the same picture often look the
same—and the effectiveness of sparse linear models—that is, small image patches
are typically well reconstructed as a linear combination of very few elements from
a potentially large basis set, or dictionary.
6.4.1 Non-local Means
Efros and Leung (1999) have shown that the self-similarities inherent to natural
images can be used effectively in texture synthesis tasks. Following this insight,
Buades, Coll, and Morel (2005) have introduced the non-local means approach to
image denoising, where the prominence of self-similarities is used as a prior on
natural images. Concretely, let us consider a noisy image written as a column
n
vector y in R , and denote by y[i]the i-th pixel value and by y the patch of size
i
m
m centered on this pixel and considered as an element of R . Similar patches y and
i
y should have similar values y[i]and y[j]. This suggests estimating the denoised
j
pixel x[i] as a weighted average (the so-called Nadaraya-Watson estimator) of all
the other pixels in the image:
n
G h (y − y )
i
j
x[i]= n y[j], (6.1)
l
i
j=1 l=1 G h (y − y )
where G h is a multi-dimensional Gaussian kernel of standard deviation h.The
weights depend on appearance similarity instead of spatial proximity in the case
of Gaussian smoothing, hence the name of non-local means. This simple approach
gives excellent results in practice, and, although naive implementations are slow (all
image pixels are used to denoise a single one), they can be sped up by using various
heuristics (by considering only patches y in some fixed spatial neighborhood of
j
y , for example). The parameters h can be taken proportional to the standard
i
deviation σ of the noise in practice; for example, h =12σ is used in the experiments
of Buades, Coll and Morel (2005).
6.4.2 Block Matching 3D (BM3D)
Classical shrinkage is a very different method for denoising. It can be summarized
as follows: Consider a signal y in R m and some nonsingular m × m matrix T .
We associate with y its code α = T y and the thresholded value α ε , obtained by
i
zeroing all coefficients α smaller than some ε> 0in the hard thresholding case, or
by setting
i
i
i
α = sign(α )(|α |− ε) + ,
ε
in the soft thresholding one (here, x + is equal to x when x> 0, and to zero oth-
−1
erwise). The denoised signal is x ε = T α ε , the idea being that noise shows up
mostly in small coefficients in the transformed domain, which is of course true only
for appropriate transformations. A classical example is wavelet shrinkage (Donoho
and Johnstone 1995), where T is the orthogonal matrix representing the discrete