Since the last update, we've released and seen wide adoption of the new Theora 1.1 'Thusnelda' encoder. The Thusnelda project concentrated on code review, cleanup, and improving efficiency of objectively measurable 'low hanging fruit'.
Encoder improvements continue with the upcoming 1.2 release, 'Ptalarbvorm' [Tall' - ar - vorm] (the P is silent, the b is optional). The Ptalarbvorm release continues with substantial subjective and psychovisual improvements to the encoder. Ptalarbvorm also addresses those aspects of Thusnelda encoding that were objective improvements but did not subjectively improve over the heuristic approaches in 1.0.
Ptalarbvorm is currently in a branch at http://svn.xiph.org/experimental/derf/theora-ptalarbvorm/.
These are changes already in SVN HEAD for Ptalarbvorm; changes in progress that have not yet landed are discussed in the next section.
Fixed quantization within a given frame is a perceptually suboptimal way to encode video. High variance (high contrast) areas use vastly more bits than needed, while low variance (low contrast) areas may be starved to the point of getting no representational bits at all. The optimal error measurement may test very well, but the human eye immediately notices the lost low-contrast detail.
This is similar to signal masking in audio. Although the ear may have a dynamic range of over 120dB, within any 'critical band' the perceptible S/N depth from the loudest present sound is approximately 30dB or less. Loud tones drown out nearby softer tones, though far away soft tones may be unaffected. Fixed quantization across an unnormalized audio spectrum performs very badly.
Activity masking in video exploits a similar tendency for the eye to see approximately as 'deeply' into low contrast areas as it does into high-contrast areas. It's neither quite the same nor as pronounced an effect in video as in audio, as high-energy areas do generally require more bits for perceptually pleasing representation. That said, a simple fixed quantization as done in 1.0 and 1.1 produces relatively poor result.
I first alluded to activity masking in demo 8 ('adaptive quantization heuristics'), and Tim landed a production implementation some months ago in Ptalarbvorm. From the SVN r16812 log entry:
"First attempt at activity masking. This adjusts the weight of the distortion based on a simple block type classification (smooth, edge, texture) and the variance of that block. These adjustments naturally extend the existing adaptive quantization implementation from r16314. This produces a significant reduction in PSNR (0.4 to 0.8 dB), but an even more significant increase in SSIM (1.5 to 3.0 dB), and visually fixes many of the problems produced by Thusnelda's RDO (e.g., smearing of smooth background regions during pans, and other SKIP-related issues)."
Two changes to SKIP block handling noticeably improve the quality of motion in low and mid-rate clips. First, eliminating 'DC force' wherein sufficient DC can force block coding outside of RDO decisions, and second accounting for the perceptual importance of tracking motion in the SKIP decision.
These changes eliminate a substantial amount of the 'dirty window' effect where block boundaries of SKIPped blocks add fixed-position textured noise to smooth motion. As suggested by the graphs, this improvement is expecially obvious at very low bitrates. The following video is one clip where the effect is striking; especially watch the sunflower petals in the right of the frame.
See below decoder performance measures comparing Thusnelda (1.1.1) to current Ptalarbvorm in SVN. These graphs reflect only x86_64 decode, which was already SIMD optimized, and does not look at the substantial improvements being rolled into mainline for decode on embedded platforms (ARM and TI).
See the Ptalarbvorm SVN logs for descriptions of specific decoder optimization work.
A number of changes are in progress but have not yet landed in the Ptalarbvorm branch. They will be part of a final 1.2 release.
Robin Watts has been working under contract with Google to merge his ARM implementation of Theora (and Vorbis/Tremor) implementation back into the Xiph codebases under a BSD license.
Similarly, David Schleef has been working with Mozilla to port Theora playback to the c64x DSP coprocessor on the OMAP3 SoC manufactured by Texas Instruments. Quoting his Leonara project page, "This series of SoCs is used in a variety of mobile devices including the Palm Pre, Motorola Droid, and Nokia's N series of phones. It has gained a following in the open source community because of the Beagle Board, which is an inexpensive introduction to embedded devices." In the case of the c64x, most of the devices featuring this DSP have enough CPU to play back Theora without assist, however, the c64x can do so with far less battery usage.
These changes will land before the 1.2 release.
Further quantization matrix tuning is needed to:
Temporal RDO evaluates for how long any bit expenditure remains important. For example, the pixels in a static scene background tend might appear no more value than foreground action when analyzing a single frame, but it becomes apparent that these pixels change very little over time, and so a relatively lavish bit expenditure up front pays off over many subsequent frames.
At the moment, the main feature mentioned as landing post 1.2 is encoder multithreading.