At CES 2017, HDMI Licensing announced the specifications for version 2.1, which is intended for release later this year. It will support a bevy of new features, including 48Gbps data rates, eARC enhancements, Game Mode variable refresh rates, and more. What caught my eye, though, was support for 10K video at 120fps. That is a huge amount of data – much more than the human eye can completely ingest in many scenarios – and begs the question: “given today’s infrastructure, how much will we have to compress the streams, and what’s the video going to look like?”
Consumers are already used to the high compression ratios (and any accompanying artifacts) necessary for delivering video over wireless and mobile (H.264/AVC or H.265/HEVC), but there’s still a tipping point at which the consumer stops watching if the video quality is too poor. This can have a significant negative impact on revenue for content providers. One study1 indicated that in 2012, global content brands lost $2.16 billion of revenue due to poor quality video streams, and were expected to lose up to $20 billion through 2017 as a result of quality issues. The same study indicated that roughly 60% of all video streams experienced quality degradation.
Although streaming technology has made significant advancements in quality during the last five years (albeit somewhat offset by the increased demand in bandwidth), the fact remains that poor video quality continues to challenge viewers’ patience and is a significant hurdle for video content vendors. In B2B settings, poor-quality video often results in trouble tickets with highly actionable comments like “the video’s bad,” “choppy,” or “laggy.” There are several types of compression artifacts that can be the cause of a “bad” video, though, so it may be useful for troubleshooting to be able to identify the different artifacts and recognize when and where you’re most likely to encounter them.
Temporal vs. Spatial Artifacts
Artifacts are first categorized by whether they’re time/sequence-based (temporal) or location-based (spatial). If you can see the artifact when the video is paused, then it’s probably a spatial artifact. If it’s much more visible while the video plays, then it’s likely temporal.
The compression algorithm being used will either utilize the I-frame (intraframe) or P- B-frames (interframe). I-frame-based algorithms like MJPEG are less susceptible to temporal artifacts since I-frames are single image encodings, while P‑frames and B‑frames hold only part of the image information. Therefore interframe algorithms typically show improved video compression rates, but at the expense of propagating compression losses to subsequent frame predictions – this propagation and “rounding on rounding” is the origin of many temporal artifacts. Objective evaluation of temporal artifacts is more challenging, though, and popular VQA models often fail to account for them.
Basis Pattern (Spatial)
The basis pattern effect takes its name from basis functions (mathematical transforms) endemic to all compression algorithms. The artifact appears similar to the ringing effect. However, whereas the ringing effect is restricted to sharp edges or lines, the basis pattern is not. It usually occurs in regions that have texture, like trees, fields of grass, waves, etc. Typically, if viewers notice a basis pattern, it has a strong negative impact on perceived video quality.
Blocking (Spatial)
Blocking is known by several names – including tiling, jaggies, mosaicing, pixelating, quilting, and checkerboarding – and it occurs whenever a complex (compressed) image is streamed over a low bandwidth connection (imagine a golf ball being passed through a garden hose). At decompression, the output of certain decoded blocks makes surrounding pixels appear averaged together to look like larger blocks. As displays increase in size, blocking typically becomes more visible (assuming resolution remains the same). However, an increase in resolution makes blocking artifacts smaller in terms of the image size and therefore less visible at a given viewing distance.
Blurring (Spatial)
Blurring is a result of loss of high spatial frequency image detail, typically at sharp edges. Colloquially referred to as “fuzziness” or “unsharpness,” it makes discrete objects – as opposed to the entire video– appear out of focus.
Color Bleeding (Spatial)
Color bleeding, as its name suggests, occurs when the edges of one color in the image unintentionally bleeds or overlaps into another color. Assuming the source video wasn’t oversaturated, this artifact is caused by low chroma subsampling.
Flickering (Temporal)
Flickering generally refers to frequent luminance or chrominance changes over time (similar to a candle’s flame), and is often broken out as fine-grain flickering and coarse-grain flickering. Fine-grain flickering is typically seen in slow motion sequences with large motion or texture details, often appearing to be flashing at high frequency. It can be very eye-catching and annoying to viewers. Coarse-granularity flickering refers to sudden luminance changes in large areas of the video. The most likely cause of this type of flickering is the use of group-of-picture (GoP) structures in the compression algorithm. I-frame-based algorithms don’t utilize GoP structures and are not susceptible to this type of artifact.
Floating (Temporal)
Floating refers to illusory motion in certain regions while the surrounding areas remain static. Visually, these regions appear as if they were floating on top of the surrounding background. This is the result of the encoder erroneously skipping predictive frames, and there are two types of floating: texture floating and edge floating. Texture floating deals with large areas of texture, like surfaces of water or trees, while edge floating relates to the boundaries of large texture areas, such as the shoreline of a lake.
Jerkiness (Temporal)
Jerkiness, or judder, is the perceived uneven or wobbly motion due to frame sampling. It’s often caused by the conversion of 24 fps movies to a 30 or 60 fps video format. The process, known as “3:2 pulldown” or “2:3 pulldown,” can’t create a flawless copy of the original movie because 24 does not divide evenly into 30 or 60. The perception of judder is reduced at higher frame rates because the motion of objects is reduced between frames. Traditionally, jerkiness is not considered a true compression artifact.
Mosquito noise (Temporal)
Mosquito noise, or “edge busyness,” gets its name from resembling a mosquito flying around a person’s head and shoulders. A variant of flickering, it’s typified as haziness and/or shimmering around high-frequency content (sharp transitions between foreground entities and the background or hard edges), and can sometimes be mistaken for ringing.
Ringing (Spatial)
Also known as echoing or ghosting, ringing takes the form of a “halo,” band, or “ghost” near sharp edges. Unlike mosquito noise, though, it doesn’t move around frame to frame. During image reconstruction (decompression), there’s insufficient data to form as sharp an edge as in the original. Mathematically, this causes both over- and undershooting to occur at the samples around the original edge. It’s the over- and undershooting that typically introduces the halo effect, creating a silhouette-like shade parallel to the original edge.
Staircase noise (Spatial)
Staircase noise is a special case of blocking along a diagonal or curved edge. Rather than rendering as smooth, it takes on the appearance of stair steps, hence the name. Depending on root cause, staircasing can be categorized as a compression artifact (insufficient sampling rates) or a scaler artifact (spatial resolution is too low).
Test Your Understanding
Now that you’ve read through this post, go back to the large “hero” image at the top of the article and see if you can determine the type of artifact indicated in each callout. Answers are below.
Callout 1: blocking at the ceiling
Callout 2: blurring on the right portion
Callout 3: ringing at the window borders on the left
Callout 4: staircase noise – stair cases at the screens in the front
Callout 5: basis patterns between the screens
Callout 6: blocking on the whiteboard in the back
1 Conviva, Viewer Experience Report, 2013.