Last update was mostly concerned with tools, infrastructure, identifying problems likely to affect implementation, summarizing data needed to be collected to make informed coupling strategy decisions, and beginning Vorbis surround optimization work.
This week's update presents the data, analysis, and a complete 'first cut' implementation of optimized surround for the reference vorbis encoder. This first cut is not quite a release; the new surround optimization is fairly useless without the libao and vorbis-tools work (described in the last update), and that's not yet quite ready for release. In addition, new quantization and noise normalization data collection and testing are still slated for future work. This 'first cut' implements surround support by generalizing and extending the quant and noise norm behaviors as implemented in current mainline Vorbis encoder. In short, I'm changing one set of variables at a time.
There's also been a substantial code cleanup in the last stage of the Vorbis encode loop since the last update. Vorbis was originally a research codec that was 'discovered' by Slashdot a bit earlier than I'd intended. In some ways the encoder was rushed to release years and years ago. Although it's well tested, an extensive internal cleanup never happened for fear of breaking working code. The reference encoder still contains a ton of unnecessary pathways that have never been used (and were not written into the spec) as well as highly convoluted general case algorithms for performing tasks for which the encoder often needs only the simplest possibility.
Listening and long test training runs were done using several 5.1 movie soundtracks (uncompressed masters and decompressed commercial DVDs) for numerous basic coupling strategies up to depth 3.
Coupling, as currently implemented in the encoder, is a mix of point/elliptical and lossless coupling. No partial phase modes are currently supported. The encoder is more aggressive about removing imaging information at higher frequencies. Using a stereo pair as an example, as the Vorbis reference encoder increases compression aggressiveness and thus the amount of imaging information it removes, the perceived effect is mostly for diffuse noise and background hiss to progressively move gently toward the center of the image. The encoder does not remove any imaging information from tonal content or high energy portions of the spectrum regardless of the aggressiveness level; however the definition of 'strong energy' is relaxed as the quality mode decreases.
Because surround is implemented in terms of pairwise couplings, the effect in aggressive surround compression will be similar to that in stereo. In addition, as quality mode drops and the encoder leans more heavily on point/elliptical stereo, the stability of the surround imaging also decreases proportional to other fidelity losses.
I've compressed the two short clips used in the previous demo at multiple decreasing quality (-q) settings to illustrate this effect. In the left column, the clip is compressed at a given -q level but completely uncoupled so there's no surround image degradation. In the right column, the clip is lossily coupled so that the imaging degrades gracefully along with other aspects of reproduction fidelity:
(See the previous demo page for test samples to verify your surround setup is functioning properly.)
(Yes, that's really a 5.1 encode of the Tron clip in 61 kbps. It's a particularly easy clip to encode.)
It's interesting to note that the amount by which coupling further compresses the audio. This is both an effect of losing more information, as well as employing an entropy encoding backend that is working with groups of channels rather than handling each channel independently.
How much of the bitrate gain above is purely the result of pairwise coupling? Which coupling order produces the best results? To answer these questions, I simply tried most of the pairwise combinations using an uncoupled entropy coder. This does not give any absolute sense of the entropy/energy compaction achieved by a given combination, but it should give a good first order relative ordering of which technique is working best and when. (There are a few possible useful combinations I've not yet tried; more on this in the 'todo' section.)
Coupling arrangements tested include:
An example of each coupling strategy in practice can be heard below, as processed with at quality setting '1':
|
Coupling moves from lossy to lossless between quality setting 4 and 6 (at 6, coupling is fully lossless unless using bitrate management). At present, the Vorbis reference encoder will use only one coupling setup during an encoding run, and we see that lossless coupling is of relatively little use. As the encoder does not enable it only when it's useful, it hurts as often as it helps. Given the current 'one strategy for the whole run' scheme, we adopt a strategy of switching off explicit pairwise coupling above quality setting 4, and rely on coupled codebook design to realize bitrate benefits above.
Testing above suggests that lossless and near-lossless pairwise coupling is a near-wash when applied globally, but the earlier 'stereo abuse' experiment from the first demo always showed some improvement over uncoupled streams. Above quality setting four, no coupling strategy without a coupled backend showed energy compaction. This suggests that a coupled entropy backend is consistently useful, even in the absence of pairwise coupling. We thus implement a multiple-of-dimension-5 VQ backend that encodes interleaved channels, and disable pairwise coupling at p5 and above.
Well, final for now.
The reductions realized by the new channel coupling always show an improvement over uncoupled encoding as well as the 'stereo abuse' strategy from last month. The most interesting region of reduction is q4 and below, where the new coupling can produce bitrates less than half of the previous release. Percentage improvement falls as the quality setting climbs, but a fairly constant (roughly 100kbps) improvement is realized all the way to q10.
The following samples demonstrate the difference between the encoders at each quality setting.
|
|
Eventually it will be time to explore and refine the noise normalization model. The 'first cut' intended to change as little as possible at a given time, rather than striking out in multiple new directions at once. The generalization of the 'classic' noise norm and quantization behavior has taken the needs of future work into account.
vorbis-tools is ready to go, but libao still needs a 1) finished/update MacOs driver (nearly ready) 2) more sanitizing of available debug information and 3) consistent documented concurrency behavior (eg, what do we do if we try to open a device that's blocked in another app. Fail? Block? It's a simple question, but one for which ao currently does not have a consistent or documented answer).