Theora: Thusnelda project update 20090507

Overview

Since the last update and alpha release, work has centered on two basic tasks: correcting the substantial energy leakage in Theora's forward DCT and optimization of the quantization matrices (and matrix selection).

How are we doing?

Detail improvement

Here's an early example of Thusnelda with some early quant matrix tuning, along with the new forward DCT versus Theora 1.0 discussed below (same encoder parameters, equal bitrates):

Above: Original Theora

Above: Thusnelda with new fDCT + slightly improved quant matrix

A graph!

Greg Maxwell has been doing automated regression and comparison testing of the ongoing Thusnelda work against previous versions of Theora, and because there's so much anecdotal FUD flying around about Thusnelda and (especially) h264, he threw h264 (the x264 encoder) into the testing mix too. The following PSNR chart is data collected against the 'Akiyo' QCIF test clip:

Left: Thusnelda PSNR graph for Akiyo QCIF test sample (corrected)

The important thing to note is that objective error steadily decreases from Theora, to the SVN version of Thusnelda, to the early experimental Thusnelda work that includes some matrix optimization (but not yet adaptive quantization). Also worth noting is that something is very very wrong with Theora support in older versions of ffmpeg, which for some reason, outside reviewers insist on using to compare Theora against other codecs. The bug is not actually in ffmpeg2theora; the same ffmpeg2theora version linked against a recent ffmpeg does not exhibit the same problem.

Earlier we had a graph showing the most recent Thusnelda work slightly exceeding x264's PSNR for this clip as rate climbed. An alert community member repeated our process and found (yet another) ffmpeg bug that was mishandling the h264 colorspace and thus penalizing x264 in the test. The graph above is the corrected data and Thusnelda no longer overtakes x264's PSNR score on this clip. There's alot more to say about this graph and why we included it below if you're wondering what the point of this comparison was.

Let me reiterate- and this is important- as folks have run way too far cherrypicking quotes from this update: Both before and after the correction, this graph shows only that Theora is improving. PSNR means very little when comparing Theora directly to x264. PSNR is an objective measure that does not represent perceived quality (though they correlate), and PSNR measurements have always been especially kind to Theora. None of these PSNR measurements, including clips where Thusnelda 'wins', mean that Thusnelda beats x264 in perceived quality, as it certainly does not (yet ;-), only that the gap is closing even before the task of detailed subjective tuning has begun in earnest.

Forward DCT

The original VP3 was designed with a forward/inverse DCT pair without perfect reconstruction that exhibits substantial and highly nonuniform energy leakage. It appears that the only real consideration in the design and implementation of the original transform pair was speed on a single platform [a classic case of premature optimization].

Original transform error

The peak and mean square error charts (values arranged by position in the 8x8 output matrix) make clear just how poor the original forward DCT is. (This is an excerpt from the full test and is representative of the results across all input conditions)):

IEEE1180-1990 test results (VP3):
Input range: [-256,255]
Sign: 1
Iterations: 10000

Peak absolute values of errors:
   3   3   2   2   2   2   2   2
   2   2   2   2   2   2   2   2
   2   3   2   2   2   2   2   2
   2   2   2   2   2   2   2   2
   2   2   2   2   2   2   2   2
   2   2   2   2   2   2   2   2
   2   3   2   2   2   2   2   2
   2   2   2   2   2   2   2   2
Worst peak error = 3 (FAILS spec limit 1)

Mean square errors:
   2.1289   0.9616   0.6611   0.3385   0.3458   0.6426   0.5268   0.3499
   0.4746   0.6312   0.6130   0.4239   0.4310   0.6287   0.6312   0.4315
   0.4706   0.6238   0.6300   0.4228   0.4159   0.6278   0.6357   0.4191
   0.3642   0.5461   0.5286   0.3527   0.3467   0.5368   0.5413   0.3405
   0.3483   0.5285   0.5463   0.3531   0.3499   0.5389   0.5294   0.3421
   0.4331   0.6090   0.6244   0.4272   0.4218   0.6296   0.6172   0.4209
   0.4164   0.6225   0.6191   0.4248   0.4285   0.6206   0.6331   0.4269
   0.3419   0.5315   0.5428   0.3586   0.3560   0.5299   0.5390   0.3482
Worst pmse = 2.128900 (FAILS spec limit 0.06)
Overall mse = 0.523162 (FAILS spec limit 0.02)

Improved transform

Although the transform pair is non-perfect (the inverse does not always and cannot always provide the exact results originally input to the forward transform), this performance can still be improved substantially. Peak and MSE error of the corrected transform, now in Thusnelda (again, a representative excerpt):

IEEE1180-1990 test results (Thusnelda r15940):
Input range: [-256,255]
Sign: 1
Iterations: 10000

Peak absolute values of errors:
   1   1   1   1   1   1   1   1
   1   1   1   1   1   1   1   1
   1   1   1   1   1   1   1   1
   1   1   1   1   1   1   1   0
   1   1   1   1   1   1   1   1
   1   1   1   1   1   1   1   1
   1   1   1   1   1   1   1   1
   1   1   1   1   1   1   1   1
Worst peak error = 1 (meets spec limit 1)

Mean square errors:
   0.0024   0.0094   0.0006   0.0041   0.0100   0.0033   0.0105   0.0062
   0.0004   0.0088   0.0013   0.0010   0.0001   0.0022   0.0016   0.0004
   0.0033   0.0022   0.0014   0.0006   0.0006   0.0004   0.0028   0.0013
   0.0005   0.0022   0.0006   0.0002   0.0003   0.0006   0.0003   0.0000
   0.0003   0.0006   0.0001   0.0004   0.0003   0.0007   0.0012   0.0004
   0.0006   0.0011   0.0008   0.0003   0.0006   0.0016   0.0008   0.0004
   0.0001   0.0013   0.0010   0.0006   0.0009   0.0002   0.0016   0.0005
   0.0001   0.0011   0.0005   0.0003   0.0002   0.0013   0.0003   0.0002
Worst pmse = 0.010500 (meets spec limit 0.06)
Overall mse = 0.001563 (meets spec limit 0.02)

We can see that although some error remains, it is reduced to the point of complete negligibility (over three orders of magnitude). Despite the lack of perfect reconstruction, we can now say that Theora/Thusnelda has a good DCT implementation.

Subjective/Objective Improvement

The substantial energy loss in VP3's forward DCT is one of two factors that contributes to VP3's well deserved reputation for losing fine detail. Originally, we believed that the leaky forward DCT contributed a small amount toward the fuzziness problem and the suboptimal quantizer matrices the lion's share of the effect.

This proves to be incorrect; at many rates the leaky forward DCT turns out to be an equal if not greater contributor to the fine detail problem. In addition, Tim unexpectedly found that the first AC coefficient in the original transform had a higher error rate than most coefficients, and this was contributing to the 'blockiness' of VP3/original Theora at high bitrates. The problem was not solely due to an overly coarse minimum DC quantizer as we thought. Correcting the transform also reduces blockiness in smooth gradients and surfaces.

Quantization matrices

Work has just begun on optimizing the quantization matrices, both in terms of the matrices themselves and use of multiple quantizer index selection for adaptive quantization within a single frame. At this time, we have some early work on the matrix element selection towards rendering the matrices more inline with contrast the contrast sensitivity curve of the Human Visual System (HVS).

Remaining work

Although a considerable amount of tuning and tweaking remains (and let's not underestimate the magnitude of that work), the last substantial piece of missing foundation work is Adaptive Quantization, the second substantial aspect of overall quantizer matrix optimization. Hopefully this will be the subject of the next Thusnelda demo/update page.

An aside about the Akiyo graph

More than one technically minded reader has asked what the point of the Akiyo graph is when PSNR measurements across different codecs don't tell anything conclusive about the codecs being compared (especially different formats with slightly different color subsampling definitions).

That's a valid question. The original reason we made this graph was to offer a rebuttal to a quite sloppy paper that both incorrectly claimed x264 exceeded Theora PSNR by nearly 20dB (!) on this specific clip, as well as incorrectly implied that PSNR comparisons conclusively indicate relative codec superiority. This paper was beginning to see wide distribution as yet another piece of 'evidence' that Theora has no hope of competing and was not worth considering for use. Although we've contacted the author, he's not yet shown any inclination to correct the errors.

Anyway, here is a copy of the ludicrously inaccurate graph which, nonetheless, plenty of people were taking seriously:

Akiyo PSNR graph as presented in Halbach 2009.

Even if PSNR doesn't mean that much comparing codecs, you still might look at that graph and think, "Wow, if it's that much worse, it must mean Theora's a complete joke."

Greg Maxwell regenerated the PSNR data carefully with consistent software tools, and I opted to include a version of his graph with my update as one way of publishing the corrected data. He kept the x264 line because it added a little scale and context to the ballpark numbers. I had also pointed out as 'amusing' that experimental Thusnelda exceeded x264's measured PSNR numbers on that graph, which highlights that PSNR numbers can't be trusted in a codec comparison (Thusnelda clearly did not look better than x264 on that clip despite the higher measure). It was a means of rebutting the entire methodology of the paper.

Original Akiyo 'rebuttal' graph with ffmpeg bug that penalized x264

'Publishing' the graph like that drew well-deserved scrutiny and unfortunately our own data was also off (although by considerably less). ffmpeg had another bug we didn't know about which caused it to mishandle the colorspace on x264 output, so the x264 PSNR value was too low by 1-4dB. Greg fixed the error in the data collection and immediately set about collecting new measures:

Corrected Akiyo graph

Sadly, the updated graph can't be used to drive home that 'PSNR can't be used to compare codecs directly', but the point is still correct. It is also still correct to look at the graph and say 'Thusnelda is improving and by alot', as PSNR is a perfectly valid tool to compare a codec to itself on individual clips.

Monty