Page 189 - Video Coding for Mobile Communications Efficiency, Complexity, and Resilience
P. 189
166 Chapter 7. Reduced-Complexity Motion Estimation Techniques
and c. This algorithm reduces complexity by roughly a factor of 4. Since
smaller blocks are employed, the algorithm provides better prediction quality
than full-search with original size blocks. This is, however, at the expense of
a larger motion overhead.
7.6 Hierarchical Search Techniques
This category uses a multiresolution representation of video. The basic idea
is to perform motion estimation at each level successively, starting with the
lowest resolution level. Thus, motion estimation is )rst performed at the lowest
resolution level to obtain a rough estimate of the motion vector. This vector is
then passed to the next-higher resolution level to serve as an initial estimate.
Motion estimation at the higher resolution level is then used to re)ne this
initial estimate. This process is repeated until the highest resolution level is
reached. At lower resolution levels, smaller blocks are used for block matching.
This reduces the complexity of calculating BDMs. At higher-resolution levels,
smaller search ranges are used since motion estimation starts from a good
initial estimate. This reduces the number of locations to be searched. Both
factors (i.e., smaller blocks at low resolutions and smaller search ranges at
high resolutions) contribute to reducing the overall complexity of the search.
Note that when reducing the resolution of the searched frames, the motion
speed is also reduced. This makes hierarchical techniques particularly useful
for estimating, with reduced complexity, high motion content. Examples of
hierarchical motion estimation algorithms are reported in Refs. 159 and 160.
Figure 7.4 shows an example of a three-level hierarchical motion estimation
technique applied to a QCIF sequence. In this case, the current frame is )rst
used to generate three current frames with the resolutions: 44 × 36, 88 × 72,
and 176 × 144. Each resolution level is a low-pass )ltered and subsampled ver-
sion of the next-higher resolution level. The resulting representation is called
a mean pyramid. The same process is also applied to the reference frame
(i.e., the previous frame). Motion estimation starts at the lowest resolution
level with a block size of 4 × 4 pels and a search range of ± 3 pels. The es-
timated motion vector of a block in this resolution is scaled up by a factor of
2 (i.e., the scaled vector will have a maximum range of 2 × (± 3) = ± 6 pels)
and then be passed to the corresponding block in the next-higher resolution
level. Motion estimation in the next-higher resolution level uses a block size
of 8 × 8 pels and a search range of ± 1 pel around the propagated vector from
the lower resolution level. This produces a motion vector with a maximum
range of (± 6)+(± 1) = ± 7 pels, which is again scaled up by a factor of 2
(to a maximum range of ± 14) and propagated to the next-higher resolution
level. In this level, a block size of 16 × 16 pels is used with a search range