While some changes might cause large metric differences, we've found
that in Daala most of the changes will be much smaller. Many changes are
near the limits of measuring quality differences, but still accumulate
into larger gains. It's very important to be accurate with
these small changes - the quantizer bound error will be smaller, but so
will the acceptable error.
When the codec improves, the quantizers won't generally exhibit a
corresponding shift upward in quality, as they essentially set the error
bound. For example, here is a graph of Daala improvements over one year,
where the rate was decreased 50% (I've drawn in the quantizer mapping as
they aren't numbered) [1]:
https://people.xiph.org/~tdaede/quantizer_mapping.png
The rate metric defined in the paper can be computed offline. The anchor
ranges only need to be computed once, and I plan to add the resulting
ranges to the test document. The changes required to use the ranges are
small. In my SciPy implementation the ranges defined by the intersection:
p0 = max(ya[0],yb[0])
p1 = min(ya[-1],yb[-1])
are replaced by the ranges from the anchor:
p0 = yr[0]
p1 = yr[1]
The BD-Rate calculation as implemented in JCT-L1100 does not enforce a
minimum number of samples within the integration range, but in practice
due to the wide spacing of points there are usually always at least 2
points within the range. The testing document currently specifies at
least 4 points within the range. More points means a better
approximation of the real curve, but more computation time. What do you
think the best value is here?
[1] raw data:
https://arewecompressedyet.com/?r%5B%5D=master-2016-01-01-0824738&r%5B%5D=master-2014-12-10T16-13-14.342Z&s=ntt-short-1