Page 95 - Video Coding for Mobile Communications Efficiency, Complexity, and Resilience
P. 95

72                                      Chapter 3.  Video Coding:  Standards


               The  annex  also  de/nes  seven  levels  (level  10  to  level  70)  of  performance
            capability for decoder implementation. For example, a decoder supporting the
            /rst  level,  level  10,  must  include  support  of  QCIF  and  sub-QCIF  resolution
            decoding, and must be capable of operation with a bit rate up to 64,000 bits per
            second with a picture decoding rate up to (15,000)=1001 pictures per second.

            3.5  The MPEG-4 Standard


            As already discussed, the formal title “Generic coding of audiovisual objects”
            given to MPEG-4 describes two important properties of the standard. The /rst
            property  is  that  it  is  a  generic  standard.  It  de/nes  tools  and  algorithms  for
            the  coding  of  natural,  synthesis,  and  hybrid  audiovisual  objects  with  a  wide
            range  of  bit  rates,  picture  formats,  transmission  media,  etc.  It  is,  therefore,
            very di,cult to describe the full functionality of such a generic standard in a
                             5
            volume  of  this  size. Thus,  this  section  will  concentrate  on  MPEG-4  natural
            video coding. In particular, the section will try to highlight the second property
            of MPEG-4, i.e., being object-based, which sets it apart from other standards.

            3.5.1  An Object-Based Representation

            MPEG-4  uses  an  object-based  representation  model.  Thus,  a  scene  is  repre-
            sented, coded, and manipulated as individual audiovisual objects (AVOs). This
            section concentrates on  natural  video  objects.
               As illustrated in Figure 3.7, an MPEG-4 video session (VS) is a collection
            of one or more video objects (VOs). A VO is an entity that a user is allowed
            to access (e.g., seek and browse) and manipulate (e.g., cut and paste). It can
            be  a  simple  rectangular  frame  or  it  can  be  an  arbitrarily  shaped  object.  A
            VO  can  consist  of  one  or  more  video  object  layers  (VOLs).  As  is  discussed
            later,  each  VO  can  be  encoded  in  either  a  scalable  (multiple  VOLs)  or  a
            nonscalable  (single  VOL)  form.  Each  VOL  consists  of  an  ordered  sequence
            of  video  object  planes  (VOPs).  A  VOP  is  an  instance  (or  a  snapshot)  of
            the  corresponding  VO  at  a  given  time.  A  number  of  VOPs  can,  optionally,
            be  grouped  together  in  a  group  of  video  object  planes  (GOV).  GOVs  can
            provide  points  in  the  bitstream  where  VOPs  are  encoded  independently  from
            each other.  This provides random  access points  within the bitstream.
               Figure 3.8 shows a general block diagram of an MPEG-4 codec. The input
            video  is  represented  using  a  number  of  VOs.  This  object-based  representa-
            tion  either  already  exists  (e.g.,  generated  with  chroma-key  technology)  or  is


              5 To  give  an  indication  of  how  generic  the  MPEG-4  standard  is,  the  MPEG-4  draft  [67]  that
            was  used in writing the current section  is more  than 300 pages.
   90   91   92   93   94   95   96   97   98   99   100