Theora: Thusnelda project update 20090402

Overview and two releases!

Since the last Demo/update at the end of last July, quite alot has happened in the Theora/Thusnelda sourcebases, the two biggest being Theora's full 1.0 release at the end of November, and an alpha release of libTheora 1.1 including the Thusnelda encoder. Since I write these pages to correspond to updates I deliver to my department in Red Hat and I was temporarily transferred off Thusnelda work after 'demo5' (right before I was due to deliver the next demo), I hadn't written any new update pages in quite a while.

In February, Tim Terriberry took over primary development duties on Thusnelda, picking up where I left off in November. Tim's development contract and the final release push were made possible by a grant from Wikimedia and Mozilla, (and let's not forget that my own year of development time was donated by Red Hat, but there wasn't any Slashdot story that ;-) and I'm going to take the opportunity again to thank these organizations for their generosity.

Theora 1.0 release

Theora 1.0 [finally] happened on Nov. 3rd, 2008, a little over four years after bitstream freeze. Between the original VP3 code grant from On2 and the Theora bitstream freeze, we wrote a complete spec, added specifiable entropy codebooks and quantization matrices, added 4:4:4 and 4:2:2 pixel formats, slightly modified the 4MV motion vector modes and otherwise 'un-hardwired' several setup parameters in the bitstream spec, as well as specified an Ogg mapping for the packet payload. The codebase itself was originally refactored as little as necessary to make it more maintainable. However, once the new bitstream features were added, Tim Terriberry wrote a higher-performance decoder more appropriate to modern CPU architectures; this was added in the Theora beta releases.

Between beta and and final release, the codebase saw primarily bug fixes and build system changes.

Theora 1.1 first alpha release: Thusnelda

Originally scheduled for Nov 25th 2008, the release schedule slipped approximately a week-- long enough to see me temporarily transferred off the project, which delayed the release a bit further. Before I returned to Xiph development this past month, Tim Terriberry was able to pick up the ongoing Thusnelda development work and complete the release (his release also has a few extra features filled in that I was going to get to after first alpha). libTheora 1.1 first alpha (Thusnelda) shipped March 27, 2009.

A few of the original items on our list of 'easy' Theora improvements remain, but at this point we've tackled and delivered the lions' share of the work. Although the code is alpha and requires more testing, Thusnelda is already more feature complete and higher performance on every metric than the original Theora encode.

Other ongoing Theora work

It's worth mentioning two other contracts that have come out of the Wikimedia/Mozilla grant, specifically Conrad Parker's and Viktor Gal's work within liboggplay, liboggz, and libfishsound.

Viktor has optimized RGB/YUV conversion code for several architectures as well as slain a host of liboggplay bugs.

Conrad has been hitting bugs so hard they just... disappear. [sorry, inside joke]

Thusnelda improvements since demo5

token/skip lambda unification

The latest new work completed for demo5 was per-token rate-distortion optimization. In review, that was a process where the cost of coding each DCT coefficient in a block was weighed against the distortion cost of leaving it out (or altering it to some cheaper-to-encode value). The same tradeoff, controlled by the value lambda, was taken into account when deciding to code a block/macroblock at all, but the two values were inconsistent and had to be tuned separately.

In the Thusnelda design, the two values should be the same. After tracking down scaling and calculation inconsistencies, the two values were unified in early December (the last work I was able to do on Thusnelda before handing over to Tim).

lambda/qi mapping and rate/distortion modeling

In order for a video encoder to be easy to use, it needs simple tuning knobs to effect 'simple' behavioral changes, such as a single master adjustment for output bitrate or constant output quality, although this might internally translate into hundreds of parameter changes.

With the unification of all the rate-distortion lambdas into a single unified lambda value, output quality is governed by two master values, the quantizer index (in Theora, an integer between 0 and 63) and the rate-distortion lambda. Although the two values are not directly fundamentally related, an efficient encoder chooses lambda and qi such that the output distortion is minimized for a given bitrate much the same way token or block coding decisions are made by trading the bitrate and distortion costs.

Once it was possible to choose an appropriate lambda for a given qi and vice versa, Thusnelda finally had a single master knob to allow a user to set a constant quality mode. Constant and average bitrate modes (rate control) is implemented in terms of varying qi according to predicted bit usage and distortion over a set of frames. Thusnelda, as it turns out, follows classical predictions for bit usage according to 'rho-domain analysis', which predicts that bit usage in any given frame is roughly linearly related to the number of nonzero DCT coefficients in that frame.

However, the rho-domain distortion modelling predictions do not hold for Thusnelda. Sadly, I'll have to leave the details of that up to Tim to explain, as the very last work I started on Thusnelda was establishing that the model didn't apply, and then making it Tim's problem to fix it.

rate management

Tim's last work before the Thusnelda alpha release was implementing a rate control mechanism on top of the finished parameter unification and rate/distortion modelling work. This is the newest code in Thusnelda and the most alpha. Although it already works considerably better than the encoder rate control in 1.0 (for one thing, it successfully controls the rate, which 1.0 doesn't) it is only in the proof of concept stage, and severely overreacts to bitrate changes much like 1.0. Many of the rate-control-caused artifacts highlighted on the original demo page are also true in the new rate control, and it must be improved to be usable in a final release.

Upcoming improvements

Original bullet points

Of the original improvements suggested to the Theora encoder that could be implemented within the frozen bitstream, the only remaining work on the original list are quant matrix improvements.

The original bulletpoint list of improvements also stressed integration work; this will be the next major stage of work.

quant matrices

Early testers have noticed that Thusnelda still causes a softening of edges, even at surprisingly high bitrates. This isn't surprising as the softness is 95% caused by the quantization matrices used by VP3 and Thusnelda has not yet changed/improved those matrices.

more accurate FDCT

...the last 5% of softness is caused by VP3's leaky forward DCT, which was purposely designed to pre-quantize values toward zero (a cheap and lazy way to reduce bitrate in an encoder with no real R/D optimization). Now that Thusnelda has a full R-D optimization engine, this 'feature' is solidly a flaw that causes detail to disappear needlessly. This is still on the 'to fix' list.

4:2:2 and 4:4:4

90% or more of the work to support 4:4:4 and 4:2:2 pixel formats is complete (and all new Thusnelda infrastructure was designed with these formats in mind). The last bit of code cleanup to fully support these additional YUV chroma pixel formats remains.

Monty