Page 53 - Video Coding for Mobile Communications Efficiency, Complexity, and Resilience
P. 53

30                                   Chapter  2.  Video  Coding:  Fundamentals



                Input                                   ·
                frame   Segment   f (   , x   ) y   Forward   F (  u  ,v  )  F (   , u   ) v   Symbol
                      into N × N   transform    Quantizer     encoder
                       blocks
                                         (a)  Encoder
                                     ·                         Reconstructed
                                    F (   , u   ) v   ˆ  f  (x  ,  ) y   frame
                             Symbol        Inverse     Combine
                             decoder      transform   N × N blocks
                                           (b)  Decoder
                          Figure  2.9:  Block  diagram of  a transform coding system


            A  unitary space-frequency  transform  is  applied  to  each  block  to  produce  an
                    16
            N × N  block  of  transform  (spectral)  coeGcients  that  are  then  suitably  quan-
            tized and coded. At the decoder, an inverse transform is applied to reconstruct
            the  frame.  The  main  goal  of  the  transform  is  to  decorrelate  the  pels  of  the
            input block. This is achieved by redistributing the energy of the pels and con-
            centrating most of it in a small set of transform coeGcients. This is known as
            energycompaction.  The  transform  process  can  also  be  interpreted  as  a  coor-
            dinate rotation of the input or as a decomposition of the input into orthogonal
            basis functions weighted by the transform coeGcients [29]. Compression comes
            about  from  two  main  mechanisms.  First,  low-energy  coeGcients  can  be  dis-
            carded with minimum impact on the reconstruction quality. Second, the HVS
            has di6ering sensitivity to di6erent frequencies. Thus, the retained coeGcients
            can be quantized according  to their  visual  importance.
               When  choosing  a  transform,  three  main  properties  are  desired:  good  en-
            ergy  compaction,  data-independent  basis  functions,  and  fast  implementation.
            The Karhunen-LoVeve transform (KLT) is the optimal transform in an energy-
            compaction  sense.  Unfortunately,  this  optimality  is  due  to  the  fact  that  the
            KLT  basis  functions  are  dependent  on  the  covariance  matrix  of  the  input
            block.  Recomputing  and  transmitting  the  basis  functions  for  each  block  is
            a  nontrivial  computational  task.  These  disadvantages  severely  limit  the  use
            of  the  KLT  in  practical  coding  systems.  The  performance  of  many  subopti-
            mal  transforms  with  data-independent  basis  functions  have  been  studied  [30].
            Examples are the discrete Fourier transform (DFT), the discrete cosine trans-
            form (DCT), the Walsh-Hadamard transform (WHT), and the Haar transform.
            It  has  been  demonstrated  that  the  DCT  has  the  closest  energy-compaction
            performance  to  that  of  the  optimum  KLT  [30].  This  has  motivated  the  de-
            velopment  of  a  number  of  fast  DCT  algorithms,  e.g.,  Ref.  31.  Due  to  these


             16 A unitary transform is a reversible linear transform with orthonormal basis functions [29].
   48   49   50   51   52   53   54   55   56   57   58