Color and Motion
A Note About Color
Color images are typically represented using several "color planes." For example, an RGB color image contains a red color plane, a green color plane, and a blue color plane. Each plane contains an entire image in a single color (red, green, or blue, respectively). When overlaid and mixed, the three planes make up the full color image. To compress a color image, the still-image compression techniques described here are applied to each color plane in turn.
Video applications often use a color scheme in which the color planes do not correspond to specific colors. Instead, one color plane contains luminance information (the overall brightness of each pixel in the color image) and two more color planes contain color (chrominance) information that when combined with luminance can be used to derive the specific levels of the red, green, and blue components of each image pixel.
Such a color scheme is convenient because the human eye is more sensitive to luminance than to color, so the chrominance planes are often stored and encoded at a lower image resolution than the luminance information. Specifically, video compression algorithms typically encode the chrominance planes with half the horizontal resolution and half the vertical resolution as the luminance plane. Thus, for every 16-pixel by 16- pixel region in the luminance plane, each chrominance plane contains one eight-pixel by eight-pixel block. In typical video compression algorithms, a "macro block" is a 16-pixel by 16-pixel region in the video frame that contains four eight-by-eight luminance blocks and the two corresponding eight-by-eight chrominance blocks. Macro blocks allow motion estimation and compensation, described below, to be used in conjunction with sub-sampling of the chrominance planes as described above.
Adding Motion To The Mix
Using the techniques described above, still-image compression algorithms such as JPEG can achieve good image quality at a compression ratio of about 10:1. The most advanced still-image coders may achieve good image quality at compression ratios as high as 30:1. Video compression algorithms, however, employ motion estimation and compensation to take advantage of the similarities between consecutive video frames. This allows video compression algorithms to achieve good video quality at compression ratios up to 200:1.
In some video scenes, such as a news program, little motion occurs. In this case, the majority of the eight-pixel by eight-pixel blocks in each video frame are identical or nearly identical to the corresponding blocks in the previous frame. A compression algorithm can take advantage of this fact by computing the difference between the two frames, and using the still-image compression techniques described above to encode this difference. Because the difference is small for most of the image blocks, it can be encoded with many fewer bits than would be required to encode each frame independently. If the camera pans or large objects in the scene move, however, then each block no longer corresponds to the same block in the previous frame. Instead, each block is similar to an eight-pixel by eight-pixel region in the previous frame that is offset from the block’s location by a distance that corresponds to the motion in the image. Note that each video frame is typically composed of a luminance plane and two chrominance planes as described above. Obviously, the motion in each of the three planes is the same. To take advantage of this fact despite the different resolutions of the luminance and chrominance planes, motion is analyzed in terms of macro blocks rather than working with individual eight-by-eight blocks in each of the three planes.
Motion Estimation and Compensation
Motion estimation attempts to find a region in a previously encoded frame (called a "reference frame") that closely matches each macro block in the current frame. For each macro block, motion estimation results in a "motion vector." The motion vector is comprised of the horizontal and vertical offsets from the location of the macro block in the current frame to the location in the reference frame of the selected 16-pixel by 16- pixel region. The video encoder typically uses VLC to encode the motion vector in the video bit stream. The selected 16-pixel by 16-pixel region is used as a prediction of the pixels in the current macro block, and the difference between the macro block and the selected region (the "prediction error") is computed and encoded using the still-image compression techniques described above. Most video compression standards allow this prediction to be bypassed if the encoder fails to find a good enough match for the macro block. In this case, the macro block itself is encoded instead of the prediction error.
Note that the reference frame isn’t always the previously displayed frame in the sequence of video frames. Video compression algorithms commonly encode frames in a different order from the order in which they are displayed. The encoder may skip several frames ahead and encode a future video frame, then skip backward and encode the next frame in the display sequence. This is done so that motion estimation can be performed backward in time, using the encoded future frame as a reference frame. Video compression algorithms also commonly allow the use of two reference frames—one previously displayed frame and one previously encoded future frame. This allows the encoder to select a 16-pixel by 16-pixel region from either reference frame, or to predict a macro block by interpolating between a 16-pixel by 16-pixel region in the previously displayed frame and a 16-pixel by 16-pixel region in the future frame.
One drawback of relying on previously encoded frames for correct decoding of each new frame is that errors in the transmission of a frame make every subsequent frame impossible to reconstruct. To alleviate this problem, video compression standards occasionally encode one video frame using still-image coding techniques only, without relying on previously encoded frames. These frames are called "intra frames" or "Iframes." If a frame in the compressed bit stream is corrupted by errors the video decoder must wait until the next I-frame, which doesn’t require a reference frame for reconstruction.
Frames that are encoded using only a previously displayed reference frame are called "P-frames," and frames that are encoded using both future and previously displayed reference frames are called "B-frames." In a typical scenario, the codec encodes an I-frame, skips several frames ahead and encodes a future P-frame using the encoded I-frame as a reference frame, then skips back to the next frame following the Iframe. The frames between the encoded I- and P-frames are encoded as B-frames. Next, the encoder skips several frames again, encoding another P-frame using the first P-frame as a reference frame, then once again skips back to fill in the gap in the display sequence with B-frames. This process continues, with a new I-frame inserted for every 12 to 15 Pand B-frames.