Theora "The push for 1.0" update (2007)

Hello Slashdotters!

The document below was written as a stark, critical, unflinching, no-bolds-barred evaluation of shortcomings in the current Theora encoder (http://svn.xiph.org/trunk/theora/lib/enc) to target specific points for improvement. It is 100% critical. That was its purpose. It does not discuss or mention Theora's positives, advantages or potential, it only talks about what's bad. Several of these 'bad' things, amusingly enough, are also true of supposed gold-standard codecs like MPEG-4 and h.264. "8x8 DCT blocks? In 2007? As the 'future of video coding'? Oh, please..."

I have not edited the document, because it still serves an internal purpose: The list of improvements we're making over the next few months. This document does *not* say 'Theora is inferior to h.261' (one of the idiot fake quotes floating around Slashdot) and does not say Theora is doomed or hopelessly obsolete. It says the current encoder is lacking compared to the very very best. It certainly is. Yet, the Theora format is entirely capable of closing with H.264 and MPEG-4 in terms of R-D while still requiring a fraction of the CPU time.

Actions speak louder than words. You can track our (already substantial) progress in the new encoder at http://svn.xiph.org/branches/theora-thusnelda/lib/enc/

Don't forget kids, this isn't a fight about *technology*. It's a fight about *control*.

Overview

"Theora" is the Ogg video codec built from the VP3 codec that On2 released to the open source world in 2002. Since then, the codebase has followed a path (a fate?) parallel to the browser code released to the world by Netscape in their final gasps. VP3's release attracted notice and some adoption but relatively little open source development. However, as it's clear that practical deployment of Dirac (or some other unencumbered next-generation video codec) is at least a few years away, several open source projects are finally in the process of actively building new codecs out of the VP3 code. Although VP3's technology is solidly previous-generation (it is similar in technology to MPEG2, but then so is MPEG4), there are several good years of improvement left in it. The basic techniques used in the code should remain competetive with MPEG4 and h.264 for a few years more.

This is not to say the VP3 code, as released, does not have a multitude of problems. Like the original Mozilla code drop, the code gives the impression that it was being 'blown up'; that is, a number of large updates were in-progress and left incomplete when the code base was abandoned.

The Ogg Project adopted the VP3 codec as 'Theora' in 2002 or 2003 (I don't even remember for sure). After an initial alpha release, the pace of development slowed dramatically only to pick up again in the past few months. Theora, as it stands now, has a number of serious performance issues. Fortunately, several of them count as "low-hanging fruit".

Performance problems

Unlike Vorbis and Speex, legitimate best-in-class codecs, Theora's coding quality is obviously poor relative to contemporary competition. This poor performance stems both from implementation and design deficiencies. As a seperate problem, Theora is also poorly integrated with Ogg due to incomplete multiplexing software and documentation on the Ogg side. Without guidance from Xiph.Org, outside development and implementation of Theora-in-Ogg has been chaotic and of low quality.

Both of these shortfall categories must be addressed if we're to encourage use of Theora with a straight face.

Known codec problems

Loss of detail

The minimum allowed quantiser in VP3 is relatively large; 4x larger than in MPEG4, more than a magnitude larger than h.264. Exacerbating this is the fact that the encoder intentionally sets absurdly large static quantizer values for HF components, rather than employing an adaptive algorithm.

The result is that VP3 (and thus theora) are known for losing virtually all fine detail, even at the highest possible bitrate encoding modes.

Poor bitrate management

The bitrate manager as implemented in VP3 and carried into Theora is swift to anger, buggy, and largely responsible for the atrocious quality of managed-rate streams. This is purely an implementation issue.

Blocking artifacts

This is primarily due to the use of the raw DCT: a fundamentally obsolete transform space with drawbacks that VP3 inadequately mitigated. Coupled with naieve and overly aggressive quantization, flat color surfaces and gentle gradients tend to disintegrate into an obvious/noticable pattern of flickering square blocks, even at high bitrates.

VP3 attempted to control the obvious blocking (and the HF energy of the hard edge in error coding) by lowpassing the block edges, essentially purposely blurring the blocks together as if smudged. The result was not smooth transitions as intended, but rather still-obvious squares with slightly blurred edges.

The wall background to the right of the Agent is not a smooth gradient, but rather obviously very blocky, even when using the highest bitrates available in alpha libtheora.

Click the image for a closeup of the region.

Thus VP3 also additionally postprocesses decoded output with a much more aggressive lowpass deblocking filter conditionally applied only to portions of the image that meet the criteria of 'probably originally started out as a flat field.'

This retrograde mitigation of blocking artifacts would be unnecessary if Theora simply used a prelapping filter along with the DCT, or a symmetric lapped wavelet basis, such as the Daub 5/3 ot 9/7 wavelets which are used by JPEG2000. This would require breaking spec.

Lack of heirarchical decomposition

Theora is strictly single-depth decomposition of 8x8 DCT blocks. The only cross-block redundancy exploited in the format is linear DC prediction. No lapping occurs between transform blocks.

Motion Compensation Boundary Artifacts

Similar to the use of the 8x8 block DCT with hard edges, motion compensation in Theora is handled by block or macroblock, moving hard squares from one location to another, massively multiplying the number of hard edges (and HF component energy) throughout the image.

Inefficient coding

Theora currently uses a semantic-token based backend encoder with minimal entropy coding. This is a previous-previous generation scheme that would have been ill considered even in VP3's early days. This backend system is stunningly inefficient in terms of bit-usage. Timothy Terriberry estimates a simple self-training range-coding backend could be reasonably expected to decrease bit usage overhead by 15-20%.

In addition to the relative inefficiency of the final backend coding, the overall coding strategy mostly leaves 8x8 blocks as wholly uncorrelated entities and makes no credible structural attempt to eliminate interblock redundancy. There is no heirarchical or predictive coding of block metadata flags, motion vectors or (for the most part) block coefficient data.

Demonstration

All of the above codec problems contribute to a general lack of bitrate performance that is easy to demonstrate. In 2005, the doom9 forum conducted a 'multi-codec shootout', the latest in a series of similar comparisons of video codec performance they'd published in the past. For the first time, Theora was included in this test. Theora fared rather poorly and the poor performace was properly indicitive of Theora's difficulties.

One specific test sequence showed most of Theora's inadequacies working against it all at once. To begin, below is a still frame from the original video used to test the various codecs. This image is frame #2209 from chapter 28 of The Matrix, dumped from DVD, expanded from anamorphic NTSC 720x480 to 852x480, cropped to 2.35:1 (the original is letterboxed), and scaled to 640x272:

Frame 2029 from the original uncompressed 'Matrix' test sequence.

Xvid was the overall winner of the codec shootout, but several other codecs in the comparison closely approached its overall quality. Below is the same frame number plucked from the stream as compressed by XviD at 580kbps:

Frame 2029 from Xvid encoded 'Matrix' test sequence.

For comparison purposes, below is the same frame from the Theora 580kbps stream:

Frame 2029 from alpha 'Theora' encoded 'Matrix' test sequence.

Click image for full video clip.

..in summary, "ugh". However, as a demonstration of just how much improvement can be realized by short-term fixes to several of the more egregious problems, next we see the improvement to the Theora result realized simply by fixing the original, broken rate management. The below is also a 580kbps Theora stream but with more intelligent rate allocation:

Frame 2029 from 'Matrix' test sequence, encoded using a modified Theora alpha with hand-tweaked rate control.

Click image for full video clip.

Still not perfect and still among the lower-performing codecs, but no longer solidly with the losers. Quantization noise and loss of detail are still quite apparent compared to the other codecs in the test, but the result is no longer embarrassing.

Known integration/usability issues

Lack of complete metastructure specification

Lack of reference multiplexing infrastructure

This point can most easily be demonstrated by the poor A/V sync exhibited by Ogg support in various contemporary players. The following test vectors are each synthetically produced and assembled to ensure pedantically correct audio and video synchronization. Each, however, tests a different legal options in the exact multiplexing scheme. The correct behavior is a 'beep' that begins when the black square appears on the screen and ends when it disappears. Yes, the bitrate is high for something so simple; it is simulating 'normal' streams.