Ogg Theora

a Free Video Codec and Multimedia Platform

Ralph Giles

Xiph.org Foundation

Open Multimedia

  • Ogg Theora rocks
  • It's free
  • You should use it
[we will want to elaborate here, obviously]

Compression

How does compression work?

Compression

  • exploit redundancy to save space
  • the more you know about the data the better
  • two general classes: lossless and lossy

Compression

How can we remove redundant information?

Because computers can process data, we can rearrange the bits.

A program + a bitstream = another bitstream.

Compression

Define the complexity of a bitstream as the size of the smallest program that can reproduce it.

C(bitstream) = min(sizeof(program) + sizeof(compressed stream))

For most data this is much smaller than the original.

In practice also we reduce complexity (and add security!) by using the same program together with different input bitstreams to handle a whole class of data. General purpose compression works on anything; class-specific compression techniques build on this by adding a model of the data which is used to further re-arrange the bits so the general-purpose compression works better. It's all about exposing entropy, concentrating the inherent complexity of the bitstream.

Compression

Practically, available cpu time and decoder complexity limit the compression factor.

A program designed for use in compressing a class of data is called a codec.

Compression

'Embarrassing' compression

  • Compress n bitstreams in n bits!
  • Not a real reduction in complexity.

The term 'embarrassing' compression is used in analogy to 'embarrassingly parallel' referring to the running a code on a multiple-cpu system not by doing the difficult work of re-writing it to support multiple concurrent execution threads, but just by running multiple copies of the same code, for example with different parameters or initial conditions.

Compression

A program that reproduces the original bitstream performs lossless compression.

What went in comes back out.

This is in contrast to so-called 'lossy' codecs like Vorbis and Theora

Lossless Compression

Don't use more symbols than you need

  • storing 'aabbccaaaabcc' with 8-bit characters: 104 bits (13 bytes)
  • if there are only three symbols, we only need 2-bit characters: 26 bits (4 bytes: 25%)

Lossless Compression

  • What if some symbols are more common than others?
  • we can save more bits by using a variable-length coding: 1.5 bit characters.
  • the trick is knowing how many bits to read!

Lossless Compression

  • Our example: aabbccaaaabcc
  • 'a' is most common: let that be '0'
  • 'b' and 'c' can be 2 bits: '10' and '11'
  • this is a Huffman Code

Lossless Compression

Huffman Coding

  • we build a binary tree: the prefix determines the code length
  • Our example: 'a'->'0', 'b'->'10', 'c'->'11'
    aabbccaaaabcc -> 00101011110000101111 (20 bits, 19%)
  • this is most efficient if the symbol probabilities are 2-n
  • for other distributions, there's arithmetic coding

Lossless Compression

LZ77 algorithm

  • simple redundancy: remember repeated strings
  • LZ77 was the first of a series of dictionary-like compression algorithms
  • followed by LZ78, LZW, etc.
  • unlike (the infamous) LZW, LZ77 was not encumbered by patents

Lossless Compression

LZ77 algorithm

  • store new strings as literals
  • mark repeated strings with back-references
  • back references are (offset, length) pairs to copy from the last 32k of data
  • aabbccaaaabcc... -> aabbcc(6,2)(6,2)(7,1)(7,2)...

Lossless Compression

DEFLATE algorithm.

  • a combination of LZ77 and Huffman coding.
  • forms the basis of the popular gzip program (and pkzip). It is also used in the PNG image format, FLAC, PDF and Postscript, etc.

Documented in RFC 1951.

Bit of historical trivia: L. Peter Deutsch, the author of RFC 1951 is also the original author of Ghostscript, a Postscript language interpreter and render. While Postscript level 3 has a DEFLATE operator for compressing streams, the original level 2 specification contained only an operator for LZW compression. When Unisys began pursuing use of LZW, on which they had a patent, Peter developed a compatible non-infringing filter. It doesn't actually compress, but it does transform the data in such a way as to produce a valid bitstream. Thus the patent issues were avoided.

Finding DEFLATE itself, the non-patented alternative, to be poorly documented he wrote RFC 1951, which helped cement the algorithm as central to internet and open source technology.

Lossless Compression

DEFLATE algorithm

  • this format has become very popular for generic lossless compression.
  • standard and patent free.
  • easy, open source implementation in zlib.

Lossless Compression

A model of the data can often inform additional compression

  • use the model to predict the data, and encode the error.
  • transform the data to concentrate complexity.

Lossless Compression

Predictors

FLAC and Speex use predictors

The PNG image format uses one of six predictors, which can vary from row to row.

Theora and Vorbis do a more complicated version of this.

Vorbis and Theora use the (quantized/interpolated) MDCT coefficients as a predictor, and of course Theora has motion vectors. These are much more complicated than the simple LPC filter used in FLAC, or the difference schemes PNG uses.

Lossless Compression

Transforms

Rotate a chunk of data to a different coordinate system. Like diagonalizing a matrix.

example RGB->YCrCb matrix

RGB->YCrCb colour space transform (used in JPEG, Theora, analog TV)

RGB to YCrCb

an RGB image...

A picture of Prague

RGB to YCrCb

...has these components...

A picture of Prague, red channel A picture of Prague, green channel A picture of Prague, blue channel

RGB to YCrCb

... and looks like this in YCrCb.

A picture of Prague, Cr channel A picture of Prague, luma channel A picture of Prague, Cb channel

Lossless Compression

Transforms

Applying a Fourier transform can concentrate complexity in smooth data.

JPEG, Theora, Vorbis all use this.

Lossy Compression

These techniques only go so far. To do better we need a new ingredient.

  • A factor of 5, like in our example is about the limit (best case)
  • For multimedia, we don't need bit-accurate reproduction. Just something that looks or sounds the same.
  • remove perceptual as well as mathematical redundancy!

Perceptual Compression

One common approach is to truncate transformed data.

Generally, one first transforms the data to concentrate complexity (mathematical or perceptual) and then truncates the output, discarding details outside the region of concentration.

One may also represent the results of the transform less accurately than would be necessary to compress losslessly.

Perceptual Compression

One common approach is to truncate transformed data.

Another is to just represent it less accurately: quantization.

Perceptual Compression

Cambell-Robson Contrast Sensitivity Chart

Cambell-Robson Contrast Sensitivity Chart

Notice how the visual boundary between the bands and the smooth gray space falls toward the right end of the chart. This is why throwing away information at high spatial frequency doesn't compromise image appearance much. Because there's a lot of information at high frequency, this adds a lot of to compression efficiency.

Theora




a free video codec

Theora Overview

a free video codec

same family as MPEG-1 and MPEG-2 video codecs

DCT + block motion compensation

Huffman coding

Theora Features

Distinguishing features

  • arbitrary frame sizes
  • only uses backward references
  • 'blurry' not 'blocky'
  • accurate colour
ii

Colour spaces

Supports two specific colour spaces

  • Rec. 470BG ("PAL")
  • Rec. 470M ("NTSC")
  • "unknown"

We only support 2 specific colour spaces to limit decoder complexity. These two options are at least close to most source video, and the more difficult work of mapping particular source material can be done by the encoder. Decoders are then free to concentrate on just optimizing these two options for their particular display.

The 'unknown' colour space is usually used when the encoder can't be bothered. Oddly, it can actually look "better" because it doesn't artificially limit the dynamic range, though of course the colour rendering cannot be as accurate as the specific spaces.

Packet structure

Theora creates packets of compressed data

theora bitstream packet structure

Packet structure

info header

theora bitstream packet structure

identifies the data as Theora

records basic configuration information: frame size, frame rate, maximal key frame spacing, etc.

Packet structure

comment header

theora bitstream packet structure

Simple (tag, value) metadata about the video.

Same format as the Vorbis comment metadata header.

Useful for quick author/title notes, but not full credits.

Packet structure

table header

theora bitstream packet structure

Detailed configuration data for the decoder:

  • quantization matrices
  • Huffman tables
  • deringing filter

Packet structure

Key frame packet

theora bitstream packet structure

Contains a complete image without reference to previous frames.

The theora equivalent of JPEG.

Packet structure

Delta frame packet

theora bitstream packet structure

Most frames are specified relative to two earlier frames:

  • the frame immediately preceding
  • the previous key frame

One frame per packet. Always.

Frame Decode

How is a frame actually decoded?

Frame Decode

How is a frame actually decoded?

  • Is this a key frame or a delta frame?

Frame Decode

How is a frame actually decoded?

  • Is this a key frame or a delta frame?
  • Read the list of blocks that have been coded.

Frame Decode

How is a frame actually decoded?

  • Is this a key frame or a delta frame?
  • Read the list of blocks that have been coded.
  • Read the list of coding modes for each new block.

Frame Decode

How is a frame actually decoded?

  • Is this a key frame or a delta frame?
  • Read the list of blocks that have been coded.
  • Read the list of coding modes for each new block.
  • Read the motion vector data and the quant matrix selectors.

Frame Decode

How is a frame actually decoded?

  • Is this a key frame or a delta frame?
  • Read the list of blocks that have been coded.
  • Read the list of coding modes for each new block.
  • Read the motion vector data and the quant matrix selectors.
  • Read the DCT coefficients for the residues.

Frame Decode

Having read all the coded data:

  • Predict each coded block by copying from earlier referenced blocks.
  • Predict the 0th DCT coefficient from the values of surrounding blocks.
  • Transform residue data with the iDCT and add that to the coded block.

Ogg Theora

The Ogg container lets us mix Theora video with other content

Vorbis, Speex audio!

Ogg Theora

.ogg isn't just for Vorbis

Ogg Theora makes a complete multimedia format.

Ogg Theora

Ogg gives us seeking, streaming.

Use MNG for overlays.

Developing text-based subtitle formats.

Ogg Theora streams

Ogg Theora gives us easy HTTP streaming

  • apache
  • icecast

Other Theora streams

RTP payload format development just starting.

other formats are possible. Quicktime?

Current Status

Reference implementation 1.0 alpha 3.

Stable code and bitstream format.

API will change before final release!

Current Status

Already have good playback support:

  • VLC
  • GStreamer
  • Xine
  • Helix/RealPlayer
  • DirectShow plug-ins
  • Fluendo Java applet
  • Debian packages!

Current Status

Encoder support:

  • ffmpeg2theora
  • reference example
  • RealProducer plug-ins (beta)
  • DirectShow plug-ins
  • Fluendo streaming server (unreleased)

Thanks!

Hope you enjoyed it.

Demo CD

We have some demo CD-ROMS. These contain Creative Commons licensed music and video in Vorbis, FLAC and Theora formats, along with the latest source code to our codecs.

We don't have enough for everybody, but if you don't get one or want one later, you can download the image from

http://people.xiph.org/~vanguardist/xiph-demo-alpha1.iso.