How does compression work?
How can we remove redundant information?
Because computers can process data, we can rearrange the bits.
A program + a bitstream = another bitstream.
Define the complexity of a bitstream as the size of the smallest program that can reproduce it.
C(bitstream) = min(sizeof(program) + sizeof(compressed stream))
For most data this is much smaller than the original.
In practice also we reduce complexity (and add security!) by using the same program together with different input bitstreams to handle a whole class of data. General purpose compression works on anything; class-specific compression techniques build on this by adding a model of the data which is used to further re-arrange the bits so the general-purpose compression works better. It's all about exposing entropy, concentrating the inherent complexity of the bitstream.
Practically, available cpu time and decoder complexity limit the compression factor.
A program designed for use in compressing a class of data is called a codec.
The term 'embarrassing' compression is used in analogy to 'embarrassingly parallel' referring to the running a code on a multiple-cpu system not by doing the difficult work of re-writing it to support multiple concurrent execution threads, but just by running multiple copies of the same code, for example with different parameters or initial conditions.
A program that reproduces the original bitstream performs lossless compression.
What went in comes back out.
This is in contrast to so-called 'lossy' codecs like Vorbis and Theora
Documented in RFC 1951.
A model of the data can often support additional compression
FLAC and Speex use predictors
The PNG image format uses one of six predictors, which can vary from row to row.
Theora and Vorbis do a more complicated version of this.
Vorbis and Theora use the (quantized/interpolated) MDCT coefficients as a predictor, and of course Theora has motion vectors. These are much more complicated than the simple LPC filter used in FLAC, or the difference schemes PNG uses.
Rotate a chunk of data to a different coordinate system. Like diagonalizing a matrix.
RGB->YCrCb colourspace transform (used in JPEG, Theora, analog TV)
Applying a fourier transform can concentrate complexity in smooth data.
JPEG, Theora, Vorbis all use this.
These techniques only go so far. To do better we need a new ingredient.
One common approach is to truncate transformed data.
Generally, one first transforms the data to concentrate complexity (mathematical or perceptual) and then truncates the output, discarding details outside the region of concentration.
One may also represent the results of the transform less accurately than would be necessary to compress losslessly.
Cambell-Robson Contrast Sensitivity Chart
Notice how the visual boundary between the bands and the smooth gray space falls toward the right end of the chart. This is why throwing away information at high spatial frequency doesn't compromise image appearance much. Because there's a lot of information at high frequency, this adds a lot of to compression efficiency.
One common approach is to truncate transformed data.
Hope you enjoyed it.