How does compression work?
How can we remove redundant information?
Because computers can process data, we can rearrange the bits.
A program + a bitstream = another bitstream.
Define the complexity of a bitstream as the size of the smallest program that can reproduce it.
C(bitstream) = min(sizeof(program) + sizeof(compressed stream))
For most data this is much smaller than the original.
In practice also we reduce complexity (and add security!) by using the same program together with different input bitstreams to handle a whole class of data. General purpose compression works on anything; class-specific compression techniques build on this by adding a model of the data which is used to further re-arrange the bits so the general-purpose compression works better. It's all about exposing entropy, concentrating the inherent complexity of the bitstream.
Practically, available cpu time and decoder complexity limit the compression factor.
A program designed for use in compressing a class of data is called a codec.
The term 'embarrassing' compression is used in analogy to 'embarrassingly parallel' referring to the running a code on a multiple-cpu system not by doing the difficult work of re-writing it to support multiple concurrent execution threads, but just by running multiple copies of the same code, for example with different parameters or initial conditions.
A program that reproduces the original bitstream performs lossless compression.
What went in comes back out.
This is in contrast to so-called 'lossy' codecs like Vorbis and Theora
Don't use more symbols than you need
Documented in RFC 1951.
Bit of historical trivia: L. Peter Deutsch, the author of RFC 1951 is also the original author of Ghostscript, a Postscript language interpreter and render. While Postscript level 3 has a DEFLATE operator for compressing streams, the original level 2 specification contained only an operator for LZW compression. When Unisys began pursuing use of LZW, on which they had a patent, Peter developed a compatible non-infringing filter. It doesn't actually compress, but it does transform the data in such a way as to produce a valid bitstream. Thus the patent issues were avoided.
Finding DEFLATE itself, the non-patented alternative, to be poorly documented he wrote RFC 1951, which helped cement the algorithm as central to internet and open source technology.
A model of the data can often inform additional compression
FLAC and Speex use predictors
The PNG image format uses one of six predictors, which can vary from row to row.
Theora and Vorbis do a more complicated version of this.
Vorbis and Theora use the (quantized/interpolated) MDCT coefficients as a predictor, and of course Theora has motion vectors. These are much more complicated than the simple LPC filter used in FLAC, or the difference schemes PNG uses.
Rotate a chunk of data to a different coordinate system. Like diagonalizing a matrix.
RGB->YCrCb colour space transform (used in JPEG, Theora, analog TV)
an RGB image...
...has these components...
... and looks like this in YCrCb.
Applying a Fourier transform can concentrate complexity in smooth data.
JPEG, Theora, Vorbis all use this.
These techniques only go so far. To do better we need a new ingredient.
One common approach is to truncate transformed data.
Generally, one first transforms the data to concentrate complexity (mathematical or perceptual) and then truncates the output, discarding details outside the region of concentration.
One may also represent the results of the transform less accurately than would be necessary to compress losslessly.
One common approach is to truncate transformed data.
Another is to just represent it less accurately: quantization.
Cambell-Robson Contrast Sensitivity Chart
Notice how the visual boundary between the bands and the smooth gray space falls toward the right end of the chart. This is why throwing away information at high spatial frequency doesn't compromise image appearance much. Because there's a lot of information at high frequency, this adds a lot of to compression efficiency.
same family as MPEG-1 and MPEG-2 video codecs
DCT + block motion compensation
Huffman coding
Distinguishing features
Supports two specific colour spaces
We only support 2 specific colour spaces to limit decoder complexity. These two options are at least close to most source video, and the more difficult work of mapping particular source material can be done by the encoder. Decoders are then free to concentrate on just optimizing these two options for their particular display.
The 'unknown' colour space is usually used when the encoder can't be bothered. Oddly, it can actually look "better" because it doesn't artificially limit the dynamic range, though of course the colour rendering cannot be as accurate as the specific spaces.
Theora creates packets of compressed data
identifies the data as Theora
records basic configuration information: frame size, frame rate, maximal key frame spacing, etc.
Simple (tag, value) metadata about the video.
Same format as the Vorbis comment metadata header.
Useful for quick author/title notes, but not full credits.
Detailed configuration data for the decoder:
Contains a complete image without reference to previous frames.
The theora equivalent of JPEG.
Most frames are specified relative to two earlier frames:
One frame per packet. Always.
How is a frame actually decoded?
How is a frame actually decoded?
How is a frame actually decoded?
How is a frame actually decoded?
How is a frame actually decoded?
How is a frame actually decoded?
Having read all the coded data:
The Ogg container lets us mix Theora video with other content
Vorbis, Speex audio!
.ogg isn't just for Vorbis
Ogg Theora makes a complete multimedia format.
Ogg gives us seeking, streaming.
Use MNG for overlays.
Developing text-based subtitle formats.
Ogg Theora gives us easy HTTP streaming
RTP payload format development just starting.
other formats are possible. Quicktime?
Reference implementation 1.0 alpha 3.
Stable code and bitstream format.
API will change before final release!
Already have good playback support:
Encoder support:
Hope you enjoyed it.
We have some demo CD-ROMS. These contain Creative Commons licensed music and video in Vorbis, FLAC and Theora formats, along with the latest source code to our codecs.
We don't have enough for everybody, but if you don't get one or want one later, you can download the image from
http://people.xiph.org/~vanguardist/xiph-demo-alpha1.iso.