Page 130 - Video Coding for Mobile Communications Efficiency, Complexity, and Resilience
P. 130
Section 4.6. Block-Matching Methods 107
Since the motion estimation process aims at minimizing the DFD signal, a
natural choice for the matching function is the mean squared error, which is
often formulated as the sum of squared di erences (SSD):
2
SSD(i; j)= (f t (x; y) − f t−@t (x − i; y − j)) : (4.30)
(x;y)∈B
A very similar matching function is the sum of absolute di erences (SAD):
SAD(i; j)= |f t (x; y) − f t−@t (x − i; y − j)|: (4.31)
(x;y)∈B
To compare the performance of these matching functions, a full-pel full-
search BMA was implemented. The algorithm uses 16 × 16 blocks and a max-
imum allowed motion displacement of ±15 pels in both directions. In this
algorithm, motion is estimated and compensated using original previous frames,
and motion vectors are restricted so that they do not point outside the reference
frame. Motion vectors are encoded using the median predictor and the VLC
table of the H.263 standard. Unless otherwise stated, all subsequent results in
this chapter use the same simulation conditions. Figure 4.3 compares the per-
formances of the algorithm with di erent matching functions when applied to
the rst 10 frames of the FOREMAN sequence at a frame rate of 8:33 frames=s
4
(i.e., a frame skip of 3). The quoted PSNR values are for the luma com-
ponent only. It can be seen from this gure that the SSD measure achieves
the best performance, followed very closely by the SAD measure. The NCCF
measure, on the other hand, has the worst performance. While Figure 4.3 com-
pares the performance in terms of prediction quality, Table 4.1 compares the
performances in terms of computational complexity. It can be seen that the
SAD measure has the lowest computational complexity, because it involves
no multiplications. Because of its good prediction quality and small computa-
tional complexity, SAD is preferred by most implementations. All subsequent
results assume the use of SAD as the matching function.
There are many other proposed matching functions. Most of them attempt
to further reduce complexity, but this is often at the expense of a reduced
prediction quality. A more detailed discussion of such functions is deferred to
Chapter 7.
4 Throughout this book, the term frame skip will be used to quantify the amount of temporal
subsampling with respect to the original frame rate. For example, a frame skip of 3 means
that the original sequence is temporally subsampled by a factor of 3:1. Thus, if the original
sequence has a frame rate of 30 frames=s, then the subsampled sequence will have a frame rate
of 30=3 = 10 frames=s.