Daala Update 20141223

Daala is a next-next-generation video codec being developed at Mozilla Research and Xiph.Org. The aim is not only to create a better video codec, but to create one that is a royalty-free open standard that can serve as the backbone technology for video on the internet.

Daala development for much of this year concentrated on still-image performance. We've been evaluating progress by comparing against images compressed by JPEG, VP8, and other codecs.

Wait, what? Still images? Isn't Daala supposed to be a video codec? I'm so glad you asked!

Still Images and Stand-Alone Frames

Modern video codecs achieve most of their compression via inter-frame prediction, that is, by predicting the next frame from similar image content in other frames. Video usually changes rather little from one frame to the next, so transmitting only the changes saves a great deal of space.

Of course, the first frame in a stream has to come from somewhere. Video streams also need regular restart points for seeking. The usual method for doing this is to encode stand-alone keyframes, which are essentially still images sprinkled throughout the video stream.

Keyframes, by definition, can't use any inter-frame prediction. As a result, keyframes are typically much larger than other frames. Although a video stream contains relatively few keyframes, they make up a disproportionately large amount of the stream. Therefore, it's worth spending effort to increase the still-image performance of video codecs.

Image coding performance is an area where innovation has stagnated since H.264, the current market leading video codec. HEVC, H.264's successor, adds additional prediction modes, but no substantially new techniques. As a result, HEVC, though very good, shows less still-image improvement compared to previous generations.

The Daala team needs to advance the state of the art in video compression and, necessarily, in image compression. An obvious early goal would be to outperform other codecs on still images, which is a subset of the performance improvements needed to field a strong video codec standard.

Alien Technology from the Future

HEVC sets the bar high for image performance, but it turns out that just improving performance over the decades-old JPEG is surprisingly hard. As Daala's lead researcher Tim Terriberry likes to say, "JPEG is alien technology from the future". It's an example of the kind of implausibly good performance that results from getting the minute details of a standard just right.

Surpassing JPEG without massively increasing complexity is quite difficult. It's not clear that other codecs entirely succeed in doing so (see Mozilla Research's recent codec comparison). We face the same challenge.

Have a Look!

We've improved intra-frame performance of Daala over the last few months to the point where we believe it exceeds JPEG and VP8, catches up to H.264, and gets us more than halfway to HEVC. Of course, work continues to surpass HEVC as well.

Below is a quick demo that characterizes the difference in the codecs. As you move the slider back and forth, especially notice how the details change in the trees, sky, and sand. Daala preserves texture well like JPEG, but doesn't suffer from JPEG's blocking artifacts. VP8 avoids blocking artifacts, but blurs all but the strongest edges and textures. H.264 makes an especially strong showing in this image, though it oversharpens fine edges while losing lower contrast edges and texture. Daala and HEVC are the most visually similar, with consistent treatment of features across the image. HEVC is still the clear winner for now, though that's not to say HEVC is flawless; oddly, it manages to 'trim' the trees shorter in the background! Currently, Daala's primary fault is ringing, which also impacts overall coding efficiency.

	← compared to →

Above: An interactive split-screen comparison between relatively low-rate JPEG and Daala (master as of December 16, 2014), as well as VP8, VP9, h.264 (using the x264 encoder), and HEVC (using the x265 encoder) at identical output rate, as well as the original image. Mouse motion moves the position of the split left and right; mouse clicks enable and disable split screen movement.

You can also click here for a larger version of this demo with all of the full-resolution images in Xiph's standard test image subset 1.

A single image says only so much! I've also constructed a larger version of this demo with 50 test images at full resolution; be aware that this bigger demo involves lots of large PNG images.

Try Daala on your own images using an in-browser JavaScript Daala encoder!
...or drop an image file here!
Lossy		Lossless

Some Number of Plots (of Numbers)

Image and video codecs, unlike audio codecs, are primarily evaluated by automated objective metrics. Unfortunately, objective metrics are of strictly limited use when comparing different codecs. They tend to prefer the codecs most algorithmically similar to themselves, and those built in algorithmic biases swamp the absolute results.

That said, automated metrics are useful and appropriate for showing relative improvements in a given codec (metrics may not be accurate, but they are precise). And although comparing different codecs is generally fraught with peril, high scores on a metric biased against a codec, or high scores across all metrics, does mean something.

Here you can see the change in Daala's performance since January plotted against other well known codecs. These metrics were collected over many bitrates across 50 of our test images, each of which is about a megapixel in size. The shaded area shows the gains our team made.

Want to know more?

Ongoing documentation of many of the novel technical details about how Daala works can be found on the Daala site and in the Daala demo pages. The first demo describes lapped transforms, Daala's method of avoiding blocking artifacts without using an edge-blurring loop filter. The second demo investigates performing intra-frame prediction (used in our keyframe encoding) entirely in the frequency domain. Next, the third demo shows off Daala's novel means of merging and splitting transform blocks. The fourth demo to date demonstrates predicting color using the black-and-white portion of the image. Demo 5 covers intra-paint, originally a directional prediction engine that's now used as a deringing filter. Finally (for now), the sixth demo documents PVQ, the perceptual quantization and coding engine that decodes what bits get coded and how.

You can also join the Daala mailing list or visit us in IRC in #daala on irc.freenode.net. We're always looking for new collaborators.

—Monty (monty@xiph.org) December 16, 2014

This update was originally written in July but not released for a number of mundane reasons. I'm aware it had been noticed and posted in a several places. This 'final release' version of the update is substantially the same as the July version, but it's been updated using image and metric runs with all codecs up to date as of December 16, 2014.

Monty's Daala documentation work is sponsored by
Mozilla Research.
(C) Copyright 2014 Mozilla and Xiph.Org