Monty's Vorbis surround work update 20100326

Overview

Demo 1 was mostly concerned with tools, infrastructure, identifying problems likely to affect implementation, summarizing data needed to be collected to make informed coupling strategy decisions, and beginning Vorbis surround optimization work.

Demo 2 presented data, analysis, and a complete 'first cut' implementation of optimized surround for the reference Vorbis encoder.

This week, we have a full release of all of the basic pieces with the surround work in place and some rather arcane documentation.

Releases

The library and tool releases include development and bugfixes beyond the surround work, but surround is the central improvement. (We may well be rolling new releases of the above shortly if bug reports come in, so check the Xiph.Org downloads page for more recent versions before blindly following those links.)

The libVorbis release does not include any fundamental changes to the psychoacoustics, though there's a substantial amount of new code in the quantize/normalize/couple loop. Because of all the new code, I backed out two minor psychoacoustic changes at the last moment, reverting them to the original 1.2.x behavior. They weren't bugs, just differences and I decided to minimize the number of changes in the end. I didn't want to muddy the testing picture. This is why the surround release is 1.3.1 and not 1.3.0; some folks have grabbed and used the 1.3.0 since I tagged it for release, and I wanted to avoid confusion in testing.

Psychoacoustic Changes

Two changes to psychoacoustics did survive my purge.

AoTuV M2/M3-style point-stereo HF reduction turns out to have been broken for a little while without anyone noticing. The nature of the bug cleanly disabled it without other effects. I've fixed the bug and turned it back on because it was always supposed to have been on. Curious tinkerers can revert 1.3.1 to 1.2.3 behavior by altering line 1023 in lib/psy.c from:
```
  float de=0.1*p->m_val; /* a blend of the AoTuV M2 and M3 code here and below */
	  
```
to:
```
  float de=0.;
	  
```
Floor1 line fitting got rearranged a bit to avoid a potential 32 bit overflow that -q -1 32kHz streams were triggering. The overflow caused much higher than usual bit usage. At the same time, I altered the weighting equation to return a float rather than an int. To get the 1.2.3 behavior, alter line 479 of lib/floor1.c from:
```
  double weight = (a[i].bn+a[i].an)*info->twofitweight/(a[i].an+1)+1.;
	  
```
to:
```
  double weight = floor((a[i].bn+a[i].an)*info->twofitweight/(a[i].an+1))+1.;
	  
```

These two changes should cause 1.3.1 to give identical output to 1.2.3 except, of course, for the new 5.1 coupling modes. Oh, and if you were tripping any bugs that since got fixed, like the 32 bit overflow.

Disabling coupling

1.3 adds OV_ECTL_COUPLING_SET to enable/disable coupling in the encoder along with OV_ECTL_COUPLING_GET to test if coupling is enabled (it is always enabled by default). An application may use vorbis_encode_ctl() to test/set coupling programmatically.

Coupling can be disabled in oggenc using the following:


  --advanced-encode-option disable_coupling

Regarding Vorbis codebook generation

Quite alot of arcana goes into setting up Vorbis encoding. The setup toplevel code is in lib/vorbisenc.c; it's an automated system for filling in the myriad internal structures involved in psychoacoustics and encoding from the higher-level descriptions contained in lib/modes. The good news is that the majority of the settings in the lib/modes header files, once you find them, are relatively accessible. Tweak a number, recompile, try it out.

Codebooks are a different matter. Generation of a valid codebook library is a multi-step process that both requires intimate understanding about how the floor and residue backends work (down at the spec level) as well as how to use the tools and scripts in the vq/ subdirectory. Get any tiny piece of the multi-step process wrong, and it will simply blow up at the end. Get a tiny-tiny piece only slightly wrong (like I did recently in the 16kHz codebooks) and the result will be subtle damage that maybe only automated regression testing will find.

In short, codebooks aren't something to casually play around with. Nor are they a particularly viable path for codec tuning the way psychoacoustics are. Past experimentation with generating optimal codebooks custom to each and every file resulted in bitrate reduction of only a few percent. OTOH, I might be the only person on the planet who knows how to make new codebooks using these tools, so it's a good idea to document the process for the small handful of others who may need to try.

Floor1 codebook structure

Let's assume we need to encode a floor that uses N amplitude values. Beginning at the point where we have a vector of floor values to encode, we first divide the vector into M partitions. Each partition may have any number of values, but the total of adding up the number of values in all M partitions must be N.

Each partition, in sequence, is assigned to a 'partition class'. The partitions sharing a class must have the same number of values and should also share similar statistical behavior. Partitions belonging to a class will all be encoded into the bitstream using the same set of codebooks.

Each partition class has a number of 'subclasses'; the number of subclasses may be different in each partition. Each subclass represents a different codebook choice that can be used encode a given value in that partition. The partition class book and the partition subclass book work together to perform a VQ cascade, where the class book effectively encodes the high-order bits of the partition values across several values at once while the subclass books encode the noisy low-order bits as a scalar.

Confusing. Here's an example, using the blocksize 1 floor setup for 44 kHz q1, which is the '1024x27' floor (floor #7) in lib/models/floor_all.h. The first two lines of the vorbis_info_floor1 structure are the important ones in this example:


  /* 7: 1024 x 27 */
  {
    8,{0,1,2,2,3,3,4,4},{3,4,3,4,3},{0,1,1,2,2},{-1,0,1,2,3},
    {{4},{5,6},{7,8},{-1,9,10,11},{-1,12,13,14}},

Let's expand that out a bit:


  /* 7: 1024 x 27 */
  {
    8, <— number of partitions
    {0,1,2,2,3,3,4,4}, <— partition classes for partitions 0 through 7 in order;
                      there are five total partition classes, 0 through 4
    {3,4,3,4,3},<— number of values encoded in each partition class
    {0,1,1,2,2},<— number of subclass books in each partition class (1<<n)
    {-1,0,1,2,3},<— relative class book numbers in book list
    {{4},<— subclass book number for class 0
     {5,6},<— subclass book numbers for class 1
     {7,8},<— subclass book numbers for class 2
     {-1,9,10,11},<— subclass book numbers for class 3
     {-1,12,13,14}},<— subclass book numbers for class 4

The example shows eight partitions belonging to five classes. In order, each partition is assigned to class 0, 1, 2, 2, 3, 3, 4, 4. The class zero partition encodes three values, the class one partition encodes four values, the two class 2 partitions encode three values each, the two class 3 partitions encode four values each and the two class 4 partitions encode three values each, totalling 27 floor values. Partition class 0 uses a single subclass book that encodes the full possible dynamic range. Classes 1 and 2 use two books, and classes 3 and 4 use four books. The books are then listed; first the VQ class books that encode the scalar subclass books, then the subclass books for each class. ('-1' indicates no book is used for that class/subclass, that is, we need to 'encode' only one value, zero) The book numbers are relative to the floor (the setup process in vorbisenc.c will offset them to the appropriate values in the actual setup. The books themselves are listed in _floor_1024x27_books[]).

How does the encoder know which subclass to choose? Each subclass book encodes a contiguous part of the full possible value range. The encoder reads the number of values each subclass book encodes and uses this value to determine subclassifications. So, although the partitioning and classing structure is set in the floor1 setup in the encoder, the exact range division of the subclass books is set by the books themselves.

Floor1 codebook description file

In the vq/ subdirectory, there are scripts and codebook description files for each floor used by the current encoder. Looking inside vq/floor_44.vqs, we find the following lines for floor 1024x27:


  build line_1024x27_class1 0-16
  build line_1024x27_class2 0-8
  build line_1024x27_class3 0-256
  build line_1024x27_class4 0-64
  build line_1024x27_0sub0  0-128 
  build line_1024x27_1sub0  0-32
  build line_1024x27_1sub1  32-128
  build line_1024x27_2sub0  0-32
  build line_1024x27_2sub1  32-128
  build line_1024x27_3sub1  1-18
  build line_1024x27_3sub2  18-50
  build line_1024x27_3sub3  50-128
  build line_1024x27_4sub1  1-18
  build line_1024x27_4sub2  18-50
  build line_1024x27_4sub3  50-128

Each of these lines declares the creation of a single Huffman codebook with a given codeword range. Note that the ranges as specified are important to the training process but aren't encoded into the codebook itself; a codebook without a value mapping (as are used by floor1) only knows that it encodes a certain number of ordered codewords. '32-128' results in the same final codebook as '0-96'. The range numbers are used by the training scripts to offset training values.

The first several lines of the floor_44.vqs file perform script setup functions.


  GO <— anything above this line is ignored as a comment
  >floor_44 <— sets the name of the output file
  =44c-1_s 44c0_s 44c1_s 44c2_s 44c3_s 44c4_s 44c5_s 44c6_s 44c7_s 44c8_s 44c9_s <— directories/subdirectories 
                                                                                    to search for training files

Floor1 codebook training

Once we've decided upon a floor setup and written the description file, we need to generate a set of blank, untrained books for the encoder to use while collecting training data. It's a very good idea to use untrained books for data collection as existing books may be sparse, which prevents the encoder from selecting/encoding any value that doesn't exist in the sparse book. Floor codebooks should never be sparse, but.... (to generate a sparse floor codebook, that is, a codebook that does not include a codeword entry for any values that don't exist in the training data, append noguard to the build line in the vqs file. It's a really bad idea though.)

For example, we start with a new description file to regenerate the 1024x27 codebooks:


  GO
  >floor_new
  =data

  build line_1024x27_class1 0-16
  build line_1024x27_class2 0-8
  build line_1024x27_class3 0-256
  build line_1024x27_class4 0-64
  build line_1024x27_0sub0  0-128 
  build line_1024x27_1sub0  0-32
  build line_1024x27_1sub1  32-128
  build line_1024x27_2sub0  0-32
  build line_1024x27_2sub1  32-128
  build line_1024x27_3sub1  1-18
  build line_1024x27_3sub2  18-50
  build line_1024x27_3sub3  50-128
  build line_1024x27_4sub1  1-18
  build line_1024x27_4sub2  18-50
  build line_1024x27_4sub3  50-128

Next, we build the vq utils. make vq in the vq/ subdirectory should do it. Then, we run the floor codebook generation script:


  ./make_floor_books.pl new_floor.vqs

This generates a ton of output of the form:


  [...]
  >>> rm -f line_1024x27_4sub3.tmp
  >>> huffbuild line_1024x27_4sub3.tmp 50-128 
  Could not open file line_1024x27_4sub3.tmp
    making untrained books.
  Building tree for 128 entries
  Eliminating 50 unused entries; 78 entries remain
  Done.                                
	
  >>> cat line_1024x27_4sub3.vqh >> floor_new.vqh
  >>> rm line_1024x27_4sub3.vqh
  >>> rm -f line_1024x27_4sub3.tmp
  >>> rm -f temp30102.vqd

This is all as it should be. We are generating blank untrained books as there's no training data yet, and the actual range of the 50-128 book is really only 78 entries.

The output file new_floor.vqh now holds all our new, blank codebooks. We add the floor books into the encoder by hand-- remove the old codebooks from lib/books/floor/floor_books.h (remove all the static structures with '1024x27' in the name) and insert or #include the new books. Last two steps are to rebuild the encoder with -DTRAIN_FLOOR1 and then run the encoder on training data using a mode that makes use of the new codebooks. A good training set for general purpose use is usually a few hours of varied audio and music, run through the encoder in every mode that's going to use the given codebooks.

If the training run succeeded, the encoder should have produced a number of '.vqd' files in the working directory. These are the training data files, and there will be a set for every floor the encoder used in the training run, not just the new set we're training. We move these new files to our vq/data/ directory, cd back into the vq/ directory and run the make_floor_books script a second time, this time with the training data present.

This time around, the output should look like:


  [...]
  >>> rm -f line_1024x27_4sub2.tmp
  >>> cat data/line_1024x27_4sub2.vqd >> line_1024x27_4sub2.tmp
  >>> huffbuild line_1024x27_4sub2.tmp 18-50 
  Building tree for 50 entries
  Eliminating 18 unused entries; 32 entries remain
  Total samples in training set: 1744      
  Total bits used to represent training set: 6021
  Done.                                

  >>> cat line_1024x27_4sub2.vqh >> floor_new.vqh
  >>> rm line_1024x27_4sub2.vqh
  >>> rm -f line_1024x27_4sub2.tmp
  #### build line_1024x27_4sub3  50-128


  >>> rm -f line_1024x27_4sub3.tmp
  >>> cat data/line_1024x27_4sub3.vqd >> line_1024x27_4sub3.tmp
  >>> huffbuild line_1024x27_4sub3.tmp 50-128 
  Building tree for 128 entries
  Eliminating 50 unused entries; 78 entries remain
  Total samples in training set: 5      
  Total bits used to represent training set: 25
  Done.                                

  >>> cat line_1024x27_4sub3.vqh >> floor_new.vqh
  >>> rm line_1024x27_4sub3.vqh
  >>> rm -f line_1024x27_4sub3.tmp
  >>> rm -f temp30238.vqd

Most importantly, make sure each codebook is trained; there should be no 'untrained' lines and the output from each built codebook should list the set size and resulting number of bits needed to encode it. The numbers are very small because we just used a short clip for this example.

At this point, new_floor.vqh in the vq/ directory holds the new books we just made, fully trained up and ready to go. Replace the blank books we previously inserted into the encoder's floor_books.h file, rebuild the encoder without -DTRAIN_FLOOR1 and the new books are all ready to be used.

Residue codebook structure

Residue encoding splits the residue vectors to be encoded into sequential fixed size 'partitions', much like floor encoding does. Each partition is independently classified, and the classifications are explicitly encoded by that residue backend's phrasebook (also called the groupbook in some parts of the code). Phrasebooks can be multidimensional, encoding more than one classification at a time.

The residue values themselves are encoded in multiple passes through the residue vectors by the codebooks assigned to each partition class. The first pass is the 'most significant bits' and the last past is the 'least significant bits'. For each pass, each partition uses zero or one codebooks.

In the encoder, the setup is simpler than with floors. Looking in lib/models/residue_44.h (the residue setup for stereo 44kHz encoding), we pick the mid-rate residue setup as an example:


  static const vorbis_info_residue0 _residue_44_mid={
    0,-1, -1, 10,-1,-1,
    /* 0   1   2   3   4   5   6   7   8  */
    {0},
    {-1},
    {  0,  1,  1,  2,  2,  4,  8, 16, 32},
    {  0,  0,999,  0,999,  4,  8, 16, 32},
  };

Most of this structure is a template that will be filled in later by vorbisenc.c during encoder setup. However, two parts are important. First, the value '10' sets the total number of partition classes. The last two lines set classification metrics for the particular classification scheme used (more in a bit). There are 10 partitions and only 9 class metric values as the tenth partition is a 'catch all' that is assigned all partitions that do not classify as one of the first nine.

Next we look at the residue template:


  static const vorbis_residue_template _res_44s_3[]={
    {2,0,16,  &_residue_44_mid,
     &_huff_book__44c3_s_short,&_huff_book__44c3_s_short,
     &_resbook_44s_3,&_resbook_44s_3},
	
    {2,0,32,  &_residue_44_mid,
     &_huff_book__44c3_s_long,&_huff_book__44c3_s_long,
     &_resbook_44s_3,&_resbook_44s_3}
  };

This is the template that sets 44kHz stereo quality 3 mode to use the residue setup above. This mode uses two residue backends, one for short blocks and one for long blocks. Both are declared here; the first is the short block mode, the second is the long block mode (this arrangement is itself declared elsewhere by the vorbis_info_mapping setup).

Expanding out the top entry:


    {2,<— residue backend type
     0,<— lowpass type (psychoacoustics)
     16,<— partition size
     &_residue_44_mid, <— residue setup to use
     &_huff_book__44c3_s_short,<— phrasebook for VBR mode
     &_huff_book__44c3_s_short,<— phrasebook for bitrate managed mode
     &_resbook_44s_3,<— residue value books to use for VBR
     &_resbook_44s_3,<— residue value books to use for bitrate managed modes
    }

The list of value codebooks is straightforward. In this mode, we have ten partitions and make a maximum of three passes through the residue value vector. We thus have a 10x3 array of value codebooks. Not all partitions use all three passes:


  static const static_bookblock _resbook_44s_3={
    {
      {0},
      {0,0,&_44c3_s_p1_0},
      {0,0,&_44c3_s_p2_0},
      {0,0,&_44c3_s_p3_0},
      {0,0,&_44c3_s_p4_0},
      {0,0,&_44c3_s_p5_0},
      {0,0,&_44c3_s_p6_0},
      {&_44c3_s_p7_0,&_44c3_s_p7_1},
      {&_44c3_s_p8_0,&_44c3_s_p8_1},
      {&_44c3_s_p9_0,&_44c3_s_p9_1,&_44c3_s_p9_2}
    }
  };

Residue 1 partition classification

Residue 0 (now unused) and residue 1 use the same classification strategy in the current encoder. Classification is performed according to the maximum quantized value that appears in the partition and the scaled absolute sum of the partition. The first line of classification metrics in _residue_44_mid above is the maximum value constraint, and the second line is the sum. Each partition is tested against the classification constraints from class zero through ten (ten is implicit); if a partition's contents fall at or under the constraint, that's the class class number and classification proceeds to the next partition. The constraints of the last partition class are implicitly infinite.

Residue 2 partition classification

In the reference encoder, residue 2 is only used by coupled modes. Residue 2 classification assumes that a magnitude vector is in channel 0 and all other vectors are angle vectors. Classification proceeds as it does for residues 0 and 1, but the first line of classification constraints indicate maximum absolute magnitude value and the second line maximum absolute angle value for the classed partition.

Residue codebook description file

The setup of a residue codebook description file is similar to that of the floor codebook description. For our example above, the codebook description is in vq/44c3.vqs:


  GO

  >_44c3_s noninterleaved
  haux 44c3_s/resaux_0.vqd _44c3_s_short 0,16,2 10
  haux 44c3_s/resaux_1.vqd _44c3_s_long 0,64,2 10
	
  #iter 0

  #     0   1   1   2   2   4   8  16  32   +      
  #         0  99   0  99   4   8  16  32   +

  #     0   1   2   3   4   5   6   7   8   9
  # 1                               .   .   .
  # 2                               .   .   .
  # 4       .   .   .   .   .   .           .
 
  :_p1_0 44c3_s/res_part1_pass2.vqd, 8, nonseq cull, 0 +- 1
  :_p2_0 44c3_s/res_part2_pass2.vqd, 4, nonseq cull, 0 +- 1 2
  :_p3_0 44c3_s/res_part3_pass2.vqd, 4, nonseq cull, 0 +- 1 2
  :_p4_0 44c3_s/res_part4_pass2.vqd, 2, nonseq cull, 0 +- 1 2 3 4
  :_p5_0 44c3_s/res_part5_pass2.vqd, 2, nonseq cull, 0 +- 1 2 3 4
  :_p6_0 44c3_s/res_part6_pass2.vqd, 2, nonseq cull, 0 +- 1 2 3 4 5 6 7 8


  :_p7_0 44c3_s/res_part7_pass0.vqd, 4, nonseq cull, 0 +- 11
  :_p7_1 44c3_s/res_part7_pass1.vqd, 2, nonseq cull, 0 +- 1 2 3 4 5 

  :_p8_0 44c3_s/res_part8_pass0.vqd, 2, nonseq cull, 0 +- 5 10 15 20 25 30
  :_p8_1 44c3_s/res_part8_pass1.vqd, 2, nonseq cull, 0 +- 1 2 

  :_p9_0 44c3_s/res_part9_pass0.vqd, 2, nonseq, 0 +- 255 510 765 1020 1275 1530
  :_p9_1 44c3_s/res_part9_pass1.vqd, 2, nonseq, 0 +- 17 34 51 68 85 102 119
  :_p9_2 44c3_s/res_part9_pass2.vqd, 2, nonseq, 0 +- 1 2 3 4 5 6 7 8

Anything above the 'GO' line is ignored.

<_44c3_s noninterleaved sets the base output file name, and declares that the training data file is from residue type 1 or type 2 (as opposed to type 0, which is no longer used).

haux 44c3_s/resaux_0.vqd _44c3_s_short 0,16,2 10 builds a phrasebook named '_44c3_s_short' with training data from 44c3_s/resaux_0.vqd, using 16 values from each training data line starting at offset zero, and generating codewords with 2 dimensions and 10 classification values. The resulting codebook will have a total of 100 codewords.

haux 44c3_s/resaux_1.vqd _44c3_s_long 0,64,2 10 builds a similar phrasebook for long blocks.

# is a comment character; I've embedded some ascii art to show which partitions use codebooks in which passes, as well as duplicating the classification data along the top for illustrative purposes.

Each line beginning with ':' specifies a single residue value codebook. Using the first such line as an example, _p1_0 is appended to "_44c3_s" to produce an output codebook named "_44c3_s_p1_0", in a file named "_44c3_s_p1_0.vqh". The codebook will be trained using data in 44c3_s/res_part1_pass2.vqd, it will be eight dimensional, nonsequential (values are absolute), cull indicates all values that don't appear in the training data are to be removed, making it a sparse codebook (sparse codebooks can be much smaller than fully populated codebooks; essential in an 8-dimensional codebook!) and the range of the codebook is -1, 0 and 1.

Residue codebook Training

First, make sure the vq utils in vq/ are built using:


  make vq;
  ln -fs latticetune res0tune;
  ln -fs latticetune res1tune;

Residue codebook training is a process nearly identical to floor codebook training. We begin by building blank, untrained codebooks according to the description in our setup file (training with preexisting sparse codebooks would be a disaster!) For example, building the books in the above description file:


  ./make_residue_books 44c3.vqs

This produces a collection of output .vqh files containing one new codebook each (might as well cat them together into one file). We replace the existing books of the same names in lib/books/coupled/res_books_*.h and lib/books/uncoupled/res_books_*.h, or just add them if the books are entirely new. Rebuild the encoder with -DTRAIN_RES and -DTRAIN_RESAUX. Run the encoder on the training audio. Collect the resulting .vqd files and move them to where make_residue_scripts expects to find them. Run make_residue_scripts again, and this time the output is the completed, trained residue codebooks. Inspect the output and results for correctness. Replace the first iteration's blank books with these new, complete books and rebuild the encoder without -DTRAIN_RES and -DTRAIN_RESAUX.

Training data inspection

Training data files are simple CSV lists of varying dimensions (the floor files are one dimensional, residue data files are multidimensional). The 'distribution' util in the vq/ directory is a very simple tool for querying the overall dynamic range and distribution of values in the training data, as well as dumping the value distribution in .vqh files produced by the training scripts. 'distribution' is a very simple tool of limited usefulness, but it's useful for spot-checking correctness of residue data sets and books.

A note on codebook size

Vorbis is unusual in that it packs all Huffman codebooks used by the encoding process into the bitstream header. Naturally, this greatly limits the maximum packed size of the codebooks that should be used by an encoder. Total packed codebook size must be kept under 5kB total to interoperate properly with all Vorbis decoders. Ideally, total codebook size should be 4kB or less.

Monty's Vorbis surround coupling work is sponsored by Red Hat Emerging Technologies.
This page is Copyright (C) 2010 Monty