Monty's Vorbis surround work update 20100311

Overview

Last update was mostly concerned with tools, infrastructure, identifying problems likely to affect implementation, summarizing data needed to be collected to make informed coupling strategy decisions, and beginning Vorbis surround optimization work.

This week's update presents the data, analysis, and a complete 'first cut' implementation of optimized surround for the reference vorbis encoder. This first cut is not quite a release; the new surround optimization is fairly useless without the libao and vorbis-tools work (described in the last update), and that's not yet quite ready for release. In addition, new quantization and noise normalization data collection and testing are still slated for future work. This 'first cut' implements surround support by generalizing and extending the quant and noise norm behaviors as implemented in current mainline Vorbis encoder. In short, I'm changing one set of variables at a time.

There's also been a substantial code cleanup in the last stage of the Vorbis encode loop since the last update. Vorbis was originally a research codec that was 'discovered' by Slashdot a bit earlier than I'd intended. In some ways the encoder was rushed to release years and years ago. Although it's well tested, an extensive internal cleanup never happened for fear of breaking working code. The reference encoder still contains a ton of unnecessary pathways that have never been used (and were not written into the spec) as well as highly convoluted general case algorithms for performing tasks for which the encoder often needs only the simplest possibility.

Coupling strategy

Listening and long test training runs were done using several 5.1 movie soundtracks (uncompressed masters and decompressed commercial DVDs) for numerous basic coupling strategies up to depth 3.

Perceived effect

Coupling, as currently implemented in the encoder, is a mix of point/elliptical and lossless coupling. No partial phase modes are currently supported. The encoder is more aggressive about removing imaging information at higher frequencies. Using a stereo pair as an example, as the Vorbis reference encoder increases compression aggressiveness and thus the amount of imaging information it removes, the perceived effect is mostly for diffuse noise and background hiss to progressively move gently toward the center of the image. The encoder does not remove any imaging information from tonal content or high energy portions of the spectrum regardless of the aggressiveness level; however the definition of 'strong energy' is relaxed as the quality mode decreases.

Because surround is implemented in terms of pairwise couplings, the effect in aggressive surround compression will be similar to that in stereo. In addition, as quality mode drops and the encoder leans more heavily on point/elliptical stereo, the stability of the surround imaging also decreases proportional to other fidelity losses.

I've compressed the two short clips used in the previous demo at multiple decreasing quality (-q) settings to illustrate this effect. In the left column, the clip is compressed at a given -q level but completely uncoupled so there's no surround image degradation. In the right column, the clip is lossily coupled so that the imaging degrades gracefully along with other aspects of reproduction fidelity:

(See the previous demo page for test samples to verify your surround setup is functioning properly.)

Tron closing credits clip

original clip [4068 kbps]

-q 6 lossless coupling [379 kbps]

-q 5 uncoupled [399 kbps] -q 5 lossy coupling [331 kbps]

-q 4 uncoupled [364 kbps] -q 4 lossy coupling [238 kbps]

-q 3 uncoupled [349 kbps] -q 3 lossy coupling [198 kbps]

-q 2 uncoupled [304 kbps] -q 2 lossy coupling [157 kbps]

-q 1 uncoupled [268 kbps] -q 1 lossy coupling [124 kbps]

-q 0 uncoupled [195 kbps] -q 0 lossy coupling [98 kbps]

-q -1 uncoupled [130 kbps] -q -1 lossy coupling [61 kbps]

Sita Sings the Blues clip

original clip [4068 kbps]

-q 6 lossless coupling [562 kbps]

-q 5 uncoupled [561 kbps] -q 5 lossy coupling [462 kbps]

-q 4 uncoupled [498 kbps] -q 4 lossy coupling [355 kbps]

-q 3 uncoupled [459 kbps] -q 3 lossy coupling [302 kbps]

-q 2 uncoupled [422 kbps] -q 2 lossy coupling [254 kbps]

-q 1 uncoupled [375 kbps] -q 1 lossy coupling [195 kbps]

-q 0 uncoupled [298 kbps] -q 0 lossy coupling [157 kbps]

-q -1 uncoupled [217 kbps] -q -1 lossy coupling [115 kbps]

(Yes, that's really a 5.1 encode of the Tron clip in 61 kbps. It's a particularly easy clip to encode.)

It's interesting to note that the amount by which coupling further compresses the audio. This is both an effect of losing more information, as well as employing an entropy encoding backend that is working with groups of channels rather than handling each channel independently.

Fully uncoupled vs. coupled bitrates for the test samples above. Below -q5, the Vorbis reference encoder progressively drops more surround image data. Above -q 5, explicit coupling is deactivated and bitrate is reduced primarily through an entropy backend that encodes vectors bundled together.

Contribution of explicit coupling

How much of the bitrate gain above is purely the result of pairwise coupling? Which coupling order produces the best results? To answer these questions, I simply tried most of the pairwise combinations using an uncoupled entropy coder. This does not give any absolute sense of the entropy/energy compaction achieved by a given combination, but it should give a good first order relative ordering of which technique is working best and when. (There are a few possible useful combinations I've not yet tried; more on this in the 'todo' section.)

Coupling arrangements tested include:

No coupling at all
[L*R, BL*BR] : depth 1 coupling of the Front L*R and the Back L*R
[L*BL, R*BR] : depth 1 coupling of the Front L * Back L and the Front R * Back R
[L*R=>X, BL*BR=>Y, X*Y=>Z] : depth 2 coupling as in 1 above, then coupling the front and back
[L*BL=>X, R*BR=>Y, X*Y=>Z] : depth 2 coupling as in 2 above, then coupling the left and right
[L*R=>W, W*C=>X, BL*BR=>Y, X*Y=>Z]: depth 3 coupling, pairing center with front
[L*R=>W, BL*BR=>X, W*X=>Y, Y*C=>Z] : depth 3 coupling, pairing center with all

Graphs showing the result of 'compressing' input samples using normal psychoacoustics plus the pairwise coupling schemes described above, then writing the bitstream using the uninterleaved / uncoupled backend from the uncoupled coding modes such that we see the relative benefit of pairwise coupling separate from the entropy backend.

Each sample is a 10 minute clip from the beginning of a movie (Tron, Sita, Terminator 2, and Moulin Rouge) processed at each integer quality level. The graph expresses the resulting bitrate of each coupling strategy as a percentage of an uncoupled run.

An example of each coupling strategy in practice can be heard below, as processed with at quality setting '1':

Tron closing credits clip

uncoupled clip

depth 1: L*R, BL*BR

depth 1: L*BL, R*BR

depth 2: L*R=>X, BL*BR=>Y, X*Y=>Z

depth 2: L*BL=>X, R*BR=>Y, X*Y=>Z

depth 3: L*R=>W, W*C=>X, BL*BR=>Y, X*Y=>Z

depth 3: L*R=>W, BL*BR=>X, W*X=>Y, Y*C=>Z

Coupling moves from lossy to lossless between quality setting 4 and 6 (at 6, coupling is fully lossless unless using bitrate management). At present, the Vorbis reference encoder will use only one coupling setup during an encoding run, and we see that lossless coupling is of relatively little use. As the encoder does not enable it only when it's useful, it hurts as often as it helps. Given the current 'one strategy for the whole run' scheme, we adopt a strategy of switching off explicit pairwise coupling above quality setting 4, and rely on coupled codebook design to realize bitrate benefits above.

Contribution of entropy backend

Testing above suggests that lossless and near-lossless pairwise coupling is a near-wash when applied globally, but the earlier 'stereo abuse' experiment from the first demo always showed some improvement over uncoupled streams. Above quality setting four, no coupling strategy without a coupled backend showed energy compaction. This suggests that a coupled entropy backend is consistently useful, even in the absence of pairwise coupling. We thus implement a multiple-of-dimension-5 VQ backend that encodes interleaved channels, and disable pairwise coupling at p5 and above.

Contribution of the entropy coding backend in the context of four ten minute test clips. The hybrid pairwise couping strategy is equivalent the the lower edge of the green and red lines. The blue lines represent the bitrate after coding through the entropy backend. The distance between the two is the contribution of the entropy coder.

Final results

Well, final for now.

The reductions realized by the new channel coupling always show an improvement over uncoupled encoding as well as the 'stereo abuse' strategy from last month. The most interesting region of reduction is q4 and below, where the new coupling can produce bitrates less than half of the previous release. Percentage improvement falls as the quality setting climbs, but a fairly constant (roughly 100kbps) improvement is realized all the way to q10.

Graphs directly comparing the results of uncoupled encoding, 'stereo abuse' coupling, and our 'final' first cut code.

The following samples demonstrate the difference between the encoders at each quality setting.

Tron closing credits clip

uncoupled -1 0 1 2 3 4 5 6 7 8 9 10

stereo abuse -1 0 1 2 3 4 5 6 7 8 9 10

final -1 0 1 2 3 4 5 6 7 8 9 10

Sita Sings the Blues clip

uncoupled -1 0 1 2 3 4 5 6 7 8 9 10

stereo abuse -1 0 1 2 3 4 5 6 7 8 9 10

final -1 0 1 2 3 4 5 6 7 8 9 10

Todo

Noise norm and quantization

Eventually it will be time to explore and refine the noise normalization model. The 'first cut' intended to change as little as possible at a given time, rather than striking out in multiple new directions at once. The generalization of the 'classic' noise norm and quantization behavior has taken the needs of future work into account.

Push out the release

vorbis-tools is ready to go, but libao still needs a 1) finished/update MacOs driver (nearly ready) 2) more sanitizing of available debug information and 3) consistent documented concurrency behavior (eg, what do we do if we try to open a device that's blocked in another app. Fail? Block? It's a simple question, but one for which ao currently does not have a consistent or documented answer).

Documentation!

The surround work involved dusting off and firing up the Vorbis codebook training tools for the first time in a very long time. Between codebook construction and encoder setup, there's a ton of undocumented process. I know folks have tried to play with it before in the distant past and generally failed. Shedding some light, any light at all, on the arcane machinery is probably a fine idea.

Monty's Vorbis surround coupling work is sponsored by Red Hat Emerging Technologies.
This page is Copyright (C) 2010 Monty