Since the last update, we've released and seen wide adoption of the new Theora 1.1 'Thusnelda' encoder. The Thusnelda project concentrated on code review, cleanup, and improving efficiency of objectively measurable 'low hanging fruit'.

Encoder improvements continue with the upcoming 1.2 release, 'Ptalarbvorm' [Tall' - ar - vorm] (the P is silent, the b is optional). The Ptalarbvorm release continues with substantial subjective and psychovisual improvements to the encoder. Ptalarbvorm also addresses those aspects of Thusnelda encoding that were objective improvements but did not subjectively improve over the heuristic approaches in 1.0.

Ptalarbvorm is currently in a branch at http://svn.xiph.org/experimental/derf/theora-ptalarbvorm/.

Current Improvements in 1.2 (Ptalarbvorm)

These are changes already in SVN HEAD for Ptalarbvorm; changes in progress that have not yet landed are discussed in the next section.

Activity masking

Fixed quantization within a given frame is a perceptually suboptimal way to encode video. High variance (high contrast) areas use vastly more bits than needed, while low variance (low contrast) areas may be starved to the point of getting no representational bits at all. The optimal error measurement may test very well, but the human eye immediately notices the lost low-contrast detail.

This is similar to signal masking in audio. Although the ear may have a dynamic range of over 120dB, within any 'critical band' the perceptible S/N depth from the loudest present sound is approximately 30dB or less. Loud tones drown out nearby softer tones, though far away soft tones may be unaffected. Fixed quantization across an unnormalized audio spectrum performs very badly.

Activity masking in video exploits a similar tendency for the eye to see approximately as 'deeply' into low contrast areas as it does into high-contrast areas. It's neither quite the same nor as pronounced an effect in video as in audio, as high-energy areas do generally require more bits for perceptually pleasing representation. That said, a simple fixed quantization as done in 1.0 and 1.1 produces relatively poor result.

I first alluded to activity masking in demo 8 ('adaptive quantization heuristics'), and Tim landed a production implementation some months ago in Ptalarbvorm. From the SVN r16812 log entry:

"First attempt at activity masking. This adjusts the weight of the distortion based on a simple block type classification (smooth, edge, texture) and the variance of that block. These adjustments naturally extend the existing adaptive quantization implementation from r16314. This produces a significant reduction in PSNR (0.4 to 0.8 dB), but an even more significant increase in SSIM (1.5 to 3.0 dB), and visually fixes many of the problems produced by Thusnelda's RDO (e.g., smearing of smooth background regions during pans, and other SKIP-related issues)."

Mouse over image to see Ptalarbvorm (early pre-1.2) with activity masking compared to Thusnelda (1.1).

Fixed quantization across the entire video frame results in high-contrast 'busy' sections of the image getting the lion's share of the bits while low-contrast regions can wash out entirely. Thusnelda used adaptive quantization but only as dictated by objective RDO measurement which worked out to visual behavior similar to fixed quantization. High-contrast areas of the image look good, but lower-contrast areas (such as the background trees) wash out almost entirely.

Ptalarbvorm by comparison weights bit allocation by block variance. This steals a tiny amount of visual accuracy from high-detail areas in order to preserve a great deal more detail and texture in lower-contrast/variance areas of the frame.

The original 'Parkrun' encodes from Thusnelda and early Ptalarbvorm can be found at http://people.xiph.org/~greg/video/ptalarbvorm/.

Altered Skip weighting

Two changes to SKIP block handling noticeably improve the quality of motion in low and mid-rate clips. First, eliminating 'DC force' wherein sufficient DC can force block coding outside of RDO decisions, and second accounting for the perceptual importance of tracking motion in the SKIP decision.

Changes to Ptalarbvorm SKIP decision behavior show a modest SSIM gain in most situations, and a quite large subjective improvement at low bitrates for clips with a large amount of motion in smooth areas. Graph generated against Ptalarbvorm pre-change and and Ptalarbvorm post-change (r17174).

These changes eliminate a substantial amount of the 'dirty window' effect where block boundaries of SKIPped blocks add fixed-position textured noise to smooth motion. As suggested by the graphs, this improvement is expecially obvious at very low bitrates. The following video is one clip where the effect is striking; especially watch the sunflower petals in the right of the frame.

Mouse over image to see Ptalarbvorm compared to a Thusnelda (1.1.1) encode.

Thusnelda 1.1.1 shows the 'dirty window' effect and pronounced blocking across the image, though it's most noticable in smooth regions that are generating lots of SKIP blocks, eg, the sunflower petals to the right. The effect is far more striking when wathcing the Thusnelda version of the video.

Rather than completely SKIPping, Ptalarbvorm is coding only motion vectors, which both results in less distortion piling up over time and also allows distortion to track motion, resulting in a much more pleasing effect. The Ptalarbvorm encode is also a slightly lower-rate (679kbps vs. 709kbps) to boot. Again, the difference is much more striking watching the actual Ptalarbvorm encoded video

Decoder Optimization

See below decoder performance measures comparing Thusnelda (1.1.1) to current Ptalarbvorm in SVN. These graphs reflect only x86_64 decode, which was already SIMD optimized, and does not look at the substantial improvements being rolled into mainline for decode on embedded platforms (ARM and TI).

Thusnelda 1.1.1 vs. Current Ptalarbvorm head decode speed on x86_64. x86_64 shows the least speed improvement for the optimizations made, primarily because x86_64 was already heavily SIMD optimized and is substantially less memory/register pressured than less powerful processors, 32-bit x86 included. Optimization for embedded architectures is yet to land.

See the Ptalarbvorm SVN logs for descriptions of specific decoder optimization work.

SSIM measures

Below, we find a quick round luma SSIM measures comparing Thusnelda (1.1.1) to current Ptalarbvorm in SVN. Yes, these are only QCIF resolution, mainly because unaligned exact SSIM is expensive to compute, and I didn't want Greg to spend days on it.

Luma SSIM measurements of several standard QCIF test samples comparing Ptalarbvorm and Thusnelda 1.1.1. Mouse over image to see Thusnelda and Ptalarbvorm graphs in isolation.

Changes Still Upcoming for 1.2 (Ptalarbvorm)

A number of changes are in progress but have not yet landed in the Ptalarbvorm branch. They will be part of a final 1.2 release.

ARM and c64x merges

Robin Watts has been working under contract with Google to merge his ARM implementation of Theora (and Vorbis/Tremor) implementation back into the Xiph codebases under a BSD license.

Similarly, David Schleef has been working with Mozilla to port Theora playback to the c64x DSP coprocessor on the OMAP3 SoC manufactured by Texas Instruments. Quoting his Leonara project page, "This series of SoCs is used in a variety of mobile devices including the Palm Pre, Motorola Droid, and Nokia's N series of phones. It has gained a following in the open source community because of the Beagle Board, which is an inexpensive introduction to embedded devices." In the case of the c64x, most of the devices featuring this DSP have enough CPU to play back Theora without assist, however, the c64x can do so with far less battery usage.

These changes will land before the 1.2 release.

quantization matrix adjustments

Further quantization matrix tuning is needed to:

Temporal RDO

Temporal RDO evaluates for how long any bit expenditure remains important. For example, the pixels in a static scene background tend might appear no more value than foreground action when analyzing a single frame, but it becomes apparent that these pixels change very little over time, and so a relatively lavish bit expenditure up front pays off over many subsequent frames.

Mouse over image to see a very simple temporal RDO implementation compared to stock Thusnelda (1.1).

An obvious extreme test sample (which relies heavily on temporal RDO and very little else) is the "Lossless Touhou" sample. As Theora to date has had no analysis lookahead whatsoever, it has had no way of knowing the majority of the image is static and so this is a sample in which it has traditionally fared embarrassingly poorly.

This demonstration is not of a proper temporal RDO; it is literally a five minute hack that does little more than count block SKIPs and applies the result as weighting. It does however show the improvement that can be had even with an extremely naive implementation. The full SKIP-RDO-hack video clip can be found at http://myrandomnode.dyndns.org:8080/~gmaxwell/theora/oo.ogv

True Temporal RDO will land soon as part of 1.2; more demos then!

Upcoming for 1.3 (Eyjafjallajökull ...err... Volcano)


At the moment, the main feature mentioned as landing post 1.2 is encoder multithreading.