Update 14 May 2018:
The algorithm presented here was greatly simplified and a more detailed
explanation provided in
this unpublished manuscript.
We submitted that manuscript to IEEE Signal Processing Letters in 2008, but one
of the reviewers commented that, "Surely this has been done before."
So it had been, by Thomas R. Fischer:
"A Pyramid Vector
Quantizer." IEEE Transactions on Information Theory, 32(4):568–582,
July 1986, which we found soon after.
There are some small differences in the enumeration proposed here and the one
described by Fischer.
Since our goal was to use this in a royalty-free audio codec (then
CELT, later
Opus), we chose the version that was
more than twenty years old (Fischer's).
The original note below is thus mostly a historical curiosity.
CEFT [VM07], CELT, and other speech codecs quantize n-dimensional unit vectors using a unit-pulse codebook. A code vector y is formed by adding a fixed number, m, of unit pulses:
m-1 | ||
y = | ∑ | s_{k} · ε_{jk} . |
k=0 |
Here s_{k} is the sign of the kth pulse, j_{k} is the position in the code vector and ε_{jk} is the elementary basis vector for dimension j_{k}. The signs are constrained so that if there is more than one pulse with a given position, j, they all have the same sign. That is, if j_{i} = j_{k} , then s_{i} = s_{k} . The resulting code vector y is not a unit vector unless m = 1, but a unit vector can easily be obtained by normalizing it when m > 1 .
This quantization does not provide an optimal tessellation of the n-dimensional unit sphere, but such tessellations are not even known for arbitrary dimension. Good tessellations can be found by solving large energy minimization problems, but for high rates the resulting codebooks do not fit in memory. Some good solutions are known for specific dimensions, such as n = 24 [Sloa81,AB88]. They can similarly be constructed from a regular tessellation of a circular section of the n-1 dimensional plane [HZ97], but enumerating the points in such a section faces similar problems. Regular subdivision of the convex regular polytopes (higher-dimensional analogs of the Platonic solids) provide another alternative, but are more computationally expensive to optimize over and more limited in the size of the codebooks they can produce [Burr89]. This limits the granularity of rate allocation between bands and makes extremely low rates more difficult to achieve.
For constant bit-rate codecs, every possible code vector must be encoded with the same number of bits. If there are V(n,m) unique code vectors of dimension n with m pulses, then this is most easily done by assigning every code vector an index and storing an integer between 0 and V(n,m)-1 . Two procedures are then required: one to translate a given code vector into an index, and one to translate an index into a code vector.
We start by counting the number of possible code vectors. This is like counting the number of combinations of m items taken from a set of size n with replacement, except that every unique element also has a sign bit associated with it. This last condition eliminates the chances of finding a simple closed-form expression, however we can write down a simple recurrence relation:
V(n,m) = V(n-1,m) + V(n,m-1) + V(n-1,m-1) . |
The first term corresponds to the number of code vectors that do not have any pulses in the first dimension; this is just the number of ways of distributing the pulses into the remaining dimensions. The second term corresponds to the number of code vectors that do have at least one pulse in the first dimension; this is the number of ways of distributing the remaining pulses into any of the dimensions. The last term, however, double-counts the number of code vectors that have exactly one pulse in the first dimension; this is the number of ways of distributing the remaining pulses into the remaining dimensions. This double-counting is done to account for the sign of each pulse. If, when distributing the remaining pulses, we place another one in the first dimension, then we have no choice of sign for the first pulse we placed. It must be the same as the one (or more) we already placed there. However, if this was the only pulse in the first dimension, then there are two choices: one for the positive sign, and one for the negative sign. So these possibilities are counted twice. Two base cases limit the recursion. There is exactly one way to place zero pulses, so V(n,0) = 1. Similarly, there is no way to place a non-zero number of pulses in a zero-dimensional vector, so V(0,m) = 0, m > 0. The closed-form solution to this recurrence is given by
V(n,m) = 2n · _{2}F_{1}(1-m,1-n;2;2) , |
where _{2}F_{1}(a,b;c;z) is the hypergeometric function. This function is not easy to evaluate directly, but one can compute a small table of values of V(n,m) from the recurrence, which can be accessed in constant time.
To translate a code vector into an index, all we have to do is partition the the total number of possibilities along these lines. Let i be the index we are computing, j be the index of the current dimension we're considering, k be the number of pulses we've already placed, and s_{k} and j_{k} be the signs and positions of the pulses. We require that these be sorted in ascending order on position, so that j_{k} ≤ j_{k+1}. Also, let U(n,m) = V(n,m-1) + V(n-1,m-1) be the number of vectors that include at least one pulse in the first dimension.
If k = 0, we haven't placed any previous pulses, so we always require a sign. In this case we let p=U(n-j,m-k) . On subsequent iterations, we will start with j=j_{k-1}, and thus don't require a sign. In that case, p=½U(n-j,m-k) . Now, if j < j_{k} , then we want to skip all of these possibilities, so we add p to i. Since the pulses are in sorted order, we know there's no more in dimension j, so we set j=j+1 and update p=U(n-j,m-k) . This process is repeated until j = j_{k} . If this is the first pulse, or if j_{k-1} ≠ j_{k} , then we need to encode a new sign for this pulse. This is done by partitioning the number of possibilities in half. So if s_{k} = -1, then we add ½p to i. Now we set k=k+1 and return to the beginning to encode the next pulse. When all the pulses have been processed i will be a unique index between 0 and V(n,m)-1 .
To recover the code vector corresponding to a given index, we simply reverse the process. We let i be an index computed by the above procedure, and again start with j=0 and k=0. When k>0, we start with p=½U(n-j,m-k) . If i < p, then we've got another pulse in the same dimension as the previous pulse. In this case we set j_{k}=j_{k-1} , s_{k}=s_{k-1} , k=k+1, and move on to the next pulse. For the first pulse, we instead let p=U(n-j,m-k) . Now while i ≥ p, there must be no more pulses in dimension j, so we subtract p from i, set j=j+1, and update p=U(n-j,m-k) . Finally, when i < p, we know that the code vector must contain at least one pulse in dimension j, so we set j_{k} =j. If i < ½p, then we set s_{k}=1. Otherwise s_{k}=-1, and we subtract ½p from i. Finally we set k=k+1 and return to the beginning to decode the next pulse. Eventually, we will have decoded m pulses and i will be 0.
A simple program that enumerates all possible pulse vectors for every dimension up through 10 and with each number of pulses up through 9 can be found here: cwrs.c. It should build on any Unix system with
make cwrs
(no Makefile required), or on other platforms with the compiler of your choice. It generates the list of pulses for each index, recovers the index from the pulse list, checks to make sure they match, converts the pulse list to a pulse vector, and then converts that back to a list of pulses, and checks to make sure those match. For each code vector it prints out the current index out of the total number in the codebook, the list of pulses, and the reconstructed code vector. Higher dimensions and larger numbers of pulses are possible so long as the total number of code vectors can fit in a 32-bit integer. Larger codebooks require extended-precision arithmetic.
It's possible to compute V(n,m) without relying on storage for a table in O(m) time by grouping the possibilities by the number of unique positions (and thus the number of sign bits required):
m | ||
V(n,m) = | ∑ | 2^{k}C(n,k)C(m-1,k-1) |
k=1 |
where C(n,m) is the
binomial
coefficient.
This can be enabled by toggling the #if 0
on line 33 of the code.
This slows the O(n+m) table-driven algorithm down to
O(nm+m^{2}).
For practical sizes of m and n the real execution speed winds
up being somewhat less than a factor of four slower on average, which will be
dwarfed by the computational cost of the rest of the codec.
Comments or questions? Send them to tterribe@xiph.org | Created 7 Dec. 2007, last updated 15 Dec. 2007 |