Back in ancient times, when the earth was still cooling and color TV did not yet exist, some very smart engineers were wrestling with how to compress color television signals to only 6MHz for transmission. Fortunately, the team determined that the human eye is much more sensitive to changes in brightness than color. This allowed them to reduce the color detail enough (while maintaining the brightness) to transmit color images to our televisions, and the concept of chroma subsampling was born.
Why Chroma Subsampling?
Chroma subsampling involves the reduction of color resolution in video signals in order to save bandwidth. The color component information (chroma) is reduced by sampling them at a lower rate than the brightness (luma). Although color information is discarded, human eyes are much more sensitive to variations in brightness than in color.
Continuous tone (natural) images are less impacted by subsampling than synthetic (computer) imagery. That’s because natural images have a lower spatial frequency – meaning the images are typified by large and coarse/uneven features. The uneven or textured features make it difficult to notice the effects of subsampling. Synthetic images have a higher spatial frequency, with sharp edges and fine details, but most people still cannot notice differences between subsampling rates at typical viewing distances.
Understanding Chroma Subsampling Notation
There are several different notations for chroma subsampling, each with their own complicated history, but the most common notation is J:a:b. If color space is more your thing, think of it in terms of YUV. The notation indicates how the sampling values are applied to a 2 x 4 block of pixels:
- J indicates the number of luminance samples that will be taken
- a describes how many samples are taken in the upper row of pixels
- b describes how many samples are taken in the lower row
Fundamentally, this notation expresses how to sample just the chroma while leaving the luma intact. You can read it like this: Given 4 pixels wide (J), how many unique color pixels in row 1 (a) and row 2 (b) should be used? Some of the more common chroma expressions are 4:4:4, 4:2:2, and 4:2:0 (4:4:4 represents the full color space, so no subsampling is performed). For reference, Blu-ray discs are subsampled at 4:2:0, as are most media players and television transmissions. Most video cameras record at 4:2:2 or 4:2:0.
Why Chroma Sampling Is Useful
Chroma subsampling is applied across images composed of millions of pixels being displayed multiple times per second, far beyond what a person’s eyes and brain can process. For example, UHD has more than 8 million pixels in every frame, and at 60 fps, results in almost 500 million pixels in total being displayed per second (yikes!).
In terms of bandwidth, a UHD signal (3840 x 2160) at 60 fps and 4:4:4 sampling (8-bit) equates to roughly 11.9 Gb per second (double yikes!). Here’s how to calculate the bandwidth:
The sum of the three sampling rates (J:a:b) directly correlates to the stream bandwidth. If no subsampling is performed (aka 4:4:4), the sum is 12. For 4:2:2, the sum is 8. The relevance of that advanced math is that by dividing 8 by 12, you can easily determine the percentage of bandwidth reduction (hint: it’s 33%!) gained by subsampling the chroma (~7.96 Gb/s). Subsampling at 4:2:0 results in a 50% bandwidth reduction (~5.97 Gb/s).
While chroma subsampling has a small impact on image detail – especially for images in motion – it offers a significant reduction in bandwidth. As an added benefit, visual artifacts created from chroma subsampling are much less noticeable/impactful than artifacts created from compression (blocking and ringing are more distracting that subsampling edges). Happy programming!