Page 102 -
P. 102
2.3 The digital camera 81
Figure 2.33 Image compressed with JPEG at three quality settings. Note how the amount of block artifact and
high-frequency aliasing (“mosquito noise”) increases from left to right.
higher fidelity than the chrominance signal. (Recall that the human visual system has poorer
frequency response to color than to luminance changes.) In video, it is common to subsam-
ple Cb and Cr by a factor of two horizontally; with still images (JPEG), the subsampling
(averaging) occurs both horizontally and vertically.
Once the luminance and chrominance images have been appropriately subsampled and
separated into individual images, they are then passed to a block transform stage. The most
common technique used here is the discrete cosine transform (DCT), which is a real-valued
variant of the discrete Fourier transform (DFT) (see Section 3.4.3). The DCT is a reasonable
approximation to the Karhunen–Lo` eve or eigenvalue decomposition of natural image patches,
i.e., the decomposition that simultaneously packs the most energy into the first coefficients
and diagonalizes the joint covariance matrix among the pixels (makes transform coefficients
statistically independent). Both MPEG and JPEG use 8 × 8 DCT transforms (Wallace 1991;
Le Gall 1991), although newer variants use smaller 4×4 blocks or alternative transformations,
such as wavelets (Taubman and Marcellin 2002) and lapped transforms (Malvar 1990, 1998,
2000) are now used.
After transform coding, the coefficient values are quantized into a set of small integer
values that can be coded using a variable bit length scheme such as a Huffman code or an
arithmetic code (Wallace 1991). (The DC (lowest frequency) coefficients are also adaptively
predicted from the previous block’s DC values. The term “DC” comes from “direct current”,
i.e., the non-sinusoidal or non-alternating part of a signal.) The step size in the quantization
is the main variable controlled by the quality setting on the JPEG file (Figure 2.33).
With video, it is also usual to perform block-based motion compensation, i.e., to encode
the difference between each block and a predicted set of pixel values obtained from a shifted
block in the previous frame. (The exception is the motion-JPEG scheme used in older DV
camcorders, which is nothing more than a series of individually JPEG compressed image
frames.) While basic MPEG uses 16 × 16 motion compensation blocks with integer motion
values (Le Gall 1991), newer standards use adaptively sized block, sub-pixel motions, and
the ability to reference blocks from older frames. In order to recover more gracefully from
failures and to allow for random access to the video stream, predicted P frames are interleaved
among independently coded I frames. (Bi-directional B frames are also sometimes used.)
The quality of a compression algorithm is usually reported using its peak signal-to-noise