Page 167 - Video Coding for Mobile Communications Efficiency, Complexity, and Resilience
P. 167
144 Chapter 6. Multiple-Reference motion Estimation Techniques
In Ref. 137 they proposed to use multiple global motion models to generate
the reference frames. Thus, reference frames in this case are warped versions
of the previously decoded frame using polynomial motion models. This can be
seen as an extension to GMC, where, in addition to the most dominant global
motion, less dominant motion is also captured by additional motion parameter
sets. In order to determine the multiple models, a robust clustering method
based on the iterative application of the least median of squares estimator
is employed. This model estimation method is computationally expensive. In
Ref. 138 they proposed an alternative method in which the past decoded frame
is split into blocks of )xed size. Each block is then used to estimate one
model using translational block matching followed by a gradient-based a ne
re)nement. In addition to reduced complexity, this method leads to higher
prediction gains.
In Ref. 139 they have demonstrated that combining the LTM-MCP method
of Refs. 135 and 136 with the multiple GMC method of Ref. 138 can lead to
further coding gains.
Recently, MR-MCP has been included in the enhanced reference picture
selection (ERPS) mode (annex U) of H.263++ (refer to Chapter 3).
6.3Long-Term Memory Motion-Compensated
Prediction
As already discussed, there are many MR-MCP techniques. The main
di&erence between those techniques is in the way they generate the
reference frames. The simplest and least computationally complex approach
is the LTM-MCP technique, where past decoded frames are assembled in
the multiframe memory. This chapter will therefore concentrate on the LTM-
MCP technique. More complex techniques, such as multiple GMC, may not
be suitable for computationally constrained applications such as mobile video
communication.
There are many ways to control the multiframe memory in the LTM-MCP
technique. The simplest approach is to use a sliding-window control method.
Assuming that there are M frame memories: 0 :::M −1, then the most recently
decoded past frame is stored in frame memory 0, the frame that was decoded
M time instants before is stored in frame memory M − 1, and so on. In the
next time instant, the window is moved such that the oldest frame is dropped
from memory, the contents of frame memories 0 :::M − 2 are shifted to frame
memories 1 :::M − 1, and the new past decoded frame is stored in frame
memory 0. According to this arrangement the new motion vector component
is in the range 0 ≤ d t ≤ M − 1, where d t = 0 refers to the most recent reference