Page 102 -
P. 102

2.3 The digital camera                                                                  81
















               Figure 2.33 Image compressed with JPEG at three quality settings. Note how the amount of block artifact and
               high-frequency aliasing (“mosquito noise”) increases from left to right.




               higher fidelity than the chrominance signal. (Recall that the human visual system has poorer
               frequency response to color than to luminance changes.) In video, it is common to subsam-
               ple Cb and Cr by a factor of two horizontally; with still images (JPEG), the subsampling
               (averaging) occurs both horizontally and vertically.
                  Once the luminance and chrominance images have been appropriately subsampled and
               separated into individual images, they are then passed to a block transform stage. The most
               common technique used here is the discrete cosine transform (DCT), which is a real-valued
               variant of the discrete Fourier transform (DFT) (see Section 3.4.3). The DCT is a reasonable
               approximation to the Karhunen–Lo` eve or eigenvalue decomposition of natural image patches,
               i.e., the decomposition that simultaneously packs the most energy into the first coefficients
               and diagonalizes the joint covariance matrix among the pixels (makes transform coefficients
               statistically independent). Both MPEG and JPEG use 8 × 8 DCT transforms (Wallace 1991;
               Le Gall 1991), although newer variants use smaller 4×4 blocks or alternative transformations,
               such as wavelets (Taubman and Marcellin 2002) and lapped transforms (Malvar 1990, 1998,
               2000) are now used.

                  After transform coding, the coefficient values are quantized into a set of small integer
               values that can be coded using a variable bit length scheme such as a Huffman code or an
               arithmetic code (Wallace 1991). (The DC (lowest frequency) coefficients are also adaptively
               predicted from the previous block’s DC values. The term “DC” comes from “direct current”,
               i.e., the non-sinusoidal or non-alternating part of a signal.) The step size in the quantization
               is the main variable controlled by the quality setting on the JPEG file (Figure 2.33).
                  With video, it is also usual to perform block-based motion compensation, i.e., to encode
               the difference between each block and a predicted set of pixel values obtained from a shifted
               block in the previous frame. (The exception is the motion-JPEG scheme used in older DV
               camcorders, which is nothing more than a series of individually JPEG compressed image
               frames.) While basic MPEG uses 16 × 16 motion compensation blocks with integer motion
               values (Le Gall 1991), newer standards use adaptively sized block, sub-pixel motions, and
               the ability to reference blocks from older frames. In order to recover more gracefully from
               failures and to allow for random access to the video stream, predicted P frames are interleaved
               among independently coded I frames. (Bi-directional B frames are also sometimes used.)

                  The quality of a compression algorithm is usually reported using its peak signal-to-noise
   97   98   99   100   101   102   103   104   105   106   107