Color Space Conversion
As noted above, video compression algorithms typically represent color images using luminance and chrominance planes. In contrast, video cameras and displays typically mix red, green, and blue light to represent different colors. Therefore, the red, green, and blue pixels captured by a camera must be converted into luminance and chrominance values for video encoding, and the luminance and chrominance pixels output by the video decoder must be converted to specific levels of red, green, and blue for display. The equations for this conversion require about 12 arithmetic operations per image pixel, not including the interpolation needed to compensate for the fact that the chrominance planes have a lower resolution than the luminance plane at the video compression algorithm’s input and output. For a CIF (352 by 288 pixel) image resolution at 15 frames per second, conversion (without any interpolation) requires over 18 million operations per second. This computational load can be significant; when implemented in software, color conversion requires roughly one-third to two-thirds as many processor cycles as the video decoder.
Trends and Conclusions
Video compression algorithms employ a variety of signal-processing tasks such as motion estimation, transforms, and variable-length coding. Although most modern video compression algorithms share these basic tasks, there is enormous variation among algorithms and implementation techniques. For example, the algorithmic approaches and implementation techniques used for performing motion estimation can vary among video encoders even when the encoders comply with the same compression standard. In addition, the most efficient implementation approach for a given signal-processing task can differ considerably from one processor to another, even when a similar algorithmic approach is used on each processor. Finally, the computational load of some tasks, such as motion compensation, can fluctuate wildly depending on the video program content. Therefore, the computational load of a video encoder or decoder on a particular processor can be difficult to predict.
Despite this variability, a few trends can readily be observed:
- Motion estimation is by far the most computationally demanding task in the video compression process, often making the computational load of the encoder several times greater than that of the decoder.
- The computational load of the decoder is typically dominated by the variablelength decoding, inverse transform, and motion compensation functions.
- The computational load of motion estimation, motion compensation, transform, and quantization/dequantization tasks is generally proportional to the number of pixels per frame and to the frame rate. In contrast, the computational load of the variable-length decoding function is proportional to the bit rate of the compressed video bit stream.
- Post-processing steps applied to the video stream after decoding—namely, deblocking, deringing, and color space conversion—contribute considerably to the computational load of video decoding applications. The computational load of these functions can easily exceed that of the video decompression step, and is proportional to the number of pixels per frame and to the frame rate.
The memory requirements of a video compression application are much easier to predict than its computational load: in video compression applications memory use is dominated by the large buffers used to store the current and reference video frames. Only two frame buffers are needed if the compression scheme supports only I- and P-frames; three frame buffers are needed if B-frames are also supported. Post-processing steps such as deblocking, deringing, and color space conversion may require an additional output buffer. The size of these buffers is proportional to the number of pixels per frame.
Combined, other factors such as program memory, lookup tables, and intermediate data comprise a significant portion of a typical video application’s memory use, although this portion is often still several times smaller than the frame buffer memory.
Implementing highly optimized video encoding and decoding software requires a thorough understanding of the signal-processing concepts introduced in this paper and of the target processor. Most video compression standards do not specify the method for performing motion estimation. Although reference encoder implementations are provided for most standards, in-depth understanding of video compression algorithms often allows designers to utilize more sophisticated motion estimation methods and obtain better results. In addition, a thorough understanding of signal-processing principles, practical implementations of signal-processing functions, and the details of the target processor are crucial in order to efficiently map the varied tasks in a video compression algorithm to the processor’s architectural resources.