Page 95 - Video Coding for Mobile Communications Efficiency, Complexity, and Resilience

P. 95

72 Chapter 3. Video Coding: Standards

The annex also de/nes seven levels (level 10 to level 70) of performance
capability for decoder implementation. For example, a decoder supporting the
/rst level, level 10, must include support of QCIF and sub-QCIF resolution
decoding, and must be capable of operation with a bit rate up to 64,000 bits per
second with a picture decoding rate up to (15,000)=1001 pictures per second.

3.5 The MPEG-4 Standard

As already discussed, the formal title “Generic coding of audiovisual objects”
given to MPEG-4 describes two important properties of the standard. The /rst
property is that it is a generic standard. It de/nes tools and algorithms for
the coding of natural, synthesis, and hybrid audiovisual objects with a wide
range of bit rates, picture formats, transmission media, etc. It is, therefore,
very di,cult to describe the full functionality of such a generic standard in a
5
volume of this size. Thus, this section will concentrate on MPEG-4 natural
video coding. In particular, the section will try to highlight the second property
of MPEG-4, i.e., being object-based, which sets it apart from other standards.

3.5.1 An Object-Based Representation

MPEG-4 uses an object-based representation model. Thus, a scene is repre-
sented, coded, and manipulated as individual audiovisual objects (AVOs). This
section concentrates on natural video objects.
As illustrated in Figure 3.7, an MPEG-4 video session (VS) is a collection
of one or more video objects (VOs). A VO is an entity that a user is allowed
to access (e.g., seek and browse) and manipulate (e.g., cut and paste). It can
be a simple rectangular frame or it can be an arbitrarily shaped object. A
VO can consist of one or more video object layers (VOLs). As is discussed
later, each VO can be encoded in either a scalable (multiple VOLs) or a
nonscalable (single VOL) form. Each VOL consists of an ordered sequence
of video object planes (VOPs). A VOP is an instance (or a snapshot) of
the corresponding VO at a given time. A number of VOPs can, optionally,
be grouped together in a group of video object planes (GOV). GOVs can
provide points in the bitstream where VOPs are encoded independently from
each other. This provides random access points within the bitstream.
Figure 3.8 shows a general block diagram of an MPEG-4 codec. The input
video is represented using a number of VOs. This object-based representa-
tion either already exists (e.g., generated with chroma-key technology) or is

5 To give an indication of how generic the MPEG-4 standard is, the MPEG-4 draft [67] that
was used in writing the current section is more than 300 pages.

90 91 92 93 94 95 96 97 98 99 100