Video compression algorithms ("codecs") manipulate video signals to dramatically reduce the storage and bandwidth required while maximizing perceived video quality. Understanding the operation of video codecs is essential for developers of embedded systems, processors, and tools targeting video applications. For example,understanding video codecs’ processing and memory demands is key to processor selection and software optimization.
In this article, we explore the operation and characteristics of video codecs. We explain basic video compression algorithms, including still-image compression, motion estimation, artifact reduction, and color conversion. We discuss the demands codecs make on processors and the consequences of these demands.
Video clips are made up of sequences of individual images, or "frames." Therefore, video compression algorithms share many concepts and techniques with still image compression algorithms, such as JPEG. In fact, one way to compress video is to ignore the similarities between consecutive video frames, and simply compress each frame independently of other frames. For example, some products employ this approach to compress video streams using the JPEG still-image compression standard. This approach, known as "motion JPEG" or MJPEG, is sometimes used in video production applications. Although modern video compression algorithms go beyond still-image compression schemes and take advantage of the correlation between consecutive video frames using motion estimation and motion compensation, these more advanced algorithms also employ techniques used in still-image compression algorithms. Therefore, we begin our exploration of video compression by discussing the inner workings of transform-based still-image compression algorithms such as JPEG.
Basic Building Blocks of Digital Image Compression
The image compression techniques used in JPEG and in most video compression algorithms are "lossy." That is, the original uncompressed image can’t be perfectly reconstructed from the compressed data, so some information from the original image is lost. Lossy compression algorithms attempt to ensure that the differences between the original uncompressed image and the reconstructed image are not perceptible to the human eye.
The first step in JPEG and similar image compression algorithms is to divide the image into small blocks and transform each block into a frequency-domain representation. Typically, this step uses a discrete cosine transform (DCT) on blocks that are eight pixels wide by eight pixels high. Thus, the DCT operates on 64 input pixels and yields 64 frequency-domain coefficients. The DCT itself preserves all of the information in the eight-by-eight image block. That is, an inverse DCT (IDCT) can be used to perfectly reconstruct the original 64 pixels from the DCT coefficients. However, the human eye is more sensitive to the information contained in DCT coefficients that represent low frequencies (corresponding to large features in the image) than to the information contained in DCT coefficients that represent high frequencies (corresponding to small features). Therefore, the DCT helps separate the more perceptually significant information from less perceptually significant information. Later steps in the compression algorithm encode the low-frequency DCT coefficients with high precision, but use fewer or no bits to encode the high-frequency coefficients, thus discarding information that is less perceptually significant. In the decoding algorithm, an IDCT transforms the imperfectly coded coefficients back into an 8×8 block of pixels.
The computations performed in the IDCT are nearly identical to those performed in the DCT, so these two functions have very similar processing requirements. A single two-dimensional eight-by-eight DCT or IDCT requires a few hundred instruction cycles on a typical DSP. However, video compression algorithms must often perform a vast number of DCTs and/or IDCTs per second. For example, an MPEG-4 video decoder operating at CIF (352×288) resolution and a frame rate of 30 fps may need to perform as many as 71,280 IDCTs per second, depending on the video content. The IDCT function would require over 40 MHz on a Texas Instruments TMS320C55x DSP processor (without the DCT accelerator) under these conditions. IDCT computation can take up as much as 30% of the cycles spent in a video decoder implementation.
Because the DCT and IDCT operate on small image blocks, the memory requirements of these functions are rather small and are typically negligible compared to the size of frame buffers and other data in image and video compression applications. The high computational demand and small memory footprint of the DCT and IDCT functions make them ideal candidates for implementation using dedicated hardware coprocessors.