Page 98 - Video Coding for Mobile Communications Efficiency, Complexity, and Resilience
P. 98

Section 3.5.  The MPEG-4 Standard                              75



                 Input VOP                          Texture information
                                           Texture
                           +
                         _                encoding
                            Motion-compensated
                                               Decoded
                                 VOP            texture   Texture
                                            +
                                                        decoding
                                               Decoded
                                  Decoded     current VOP
                                 reference
                         Motion                                    Multiplex  Bitstream
                       compensation   VOP   VOP memory
                          (MC)
                                           Motion   Motion information
                         Motion           encoding
                        estimation
                          (ME)
                                                     Shape information
                                         Shape encoding

                               Figure 3.10:  An MPEG-4 VOP encoder



            motion,  and  texture.  This  is  illustrated  in  Figure  3.10.  As  can  be  seen,  an
            MPEG-4  VOP  encoder  has  three  main  functionalities:  shape  encoding,  mo-
            tion  encoding  (along  with  motion  estimation  and  compensation),  and  texture
            encoding.  Note  that  the  structure  of  this  encoder  is  very  similar  to  the  MC-
            DPCM structure utilized by H.263 and most other standards. In fact, for most
            cases,  the  texture  encoder  is  DCT-based  and  the  structure  is  very  similar  to
            the conventional hybrid MC-DPCM=DCT encoder. The di1erence here is that
            the encoded entities can have arbitrary shapes rather than the /xed rectangular
            frame  shape,  and  therefore  additional  shape  information  needs  to  be  encoded
            and  transmitted.  Note  that  this  object-based  representation  can  be  thought  of
            as a generic representation. When a frame is encoded using a single VOP, this
            generic  representation  degenerates  into  the  special  case  of  rectangular  frames
            and  an  MPEG-4  encoder  becomes  almost  identical  to  an  H.263  encoder.  In
            fact,  the  MPEG-4  standard  provides  measures  to  ensure  some  level  of  inter-
            operability with MPEG-1=2 and H.263.
               A VOP is encoded on a macroblock (MB) basis. MPEG-4 supports a 4:2:0
            subsampling format with 4 –12 bits=sample. Thus, an MB consists of six 8 × 8
            blocks:  four  luma  blocks  and  two  corresponding  chroma  blocks.  To  achieve
            e,cient  encoding,  the  arbitrary  shaped  VOP  is  /rst  encapsulated  within  a
            bounding  box.  This  bounding  box  is  chosen  such  that  it  completely  contains
            the  VOP  but  uses  the  minimum  number  of  macroblocks.  This  bounding  box
            is  illustrated  for  the  Balloon  VOP  in  Figure  3.11.  Within  this  bounding  box,
   93   94   95   96   97   98   99   100   101   102   103