"Theora" is the Ogg video codec built from the VP3 codec that On2 released to the open source world in 2002. Although the original encoder is now maintained by Xiph.Org and has been updated to the Theora specification, it has not been improved over its historical coding efficiency level which is low by modern standards. The "Thusnelda" encoder project [source available in Xiph.Org SVN] addresses the inadequacies of the reference encoder by building a new, modern encoder incrementally from the inherited encoder.
Thusnelda is an incremental update project to avoid the problems faced by theora-exp and to some extent Dirac: Video coding is an inherently heuristic and highly tuned process where small but important changes can make large difference in the efficiency of an encoder. Writing a complete new encoder from scratch is a small task compared to the time required to then tune that new encoder into efficient operation. Incremental changes allow demonstrable, steady progress.
For the most part, the extensive code restructuring throughout the encoder is not an improvement in its own right, but enables improvements. That said, it has had the effect of improving data colocation and so decreasing cache misses. In all, the Thusnelda encoder is under 60% the code size of the original Theora encoder at this point. Execution speed has improved, but the amount of improvement depends on the amount of motion in the video sequence. Thusnelda can be reasonably expected to be between 1.5x and 4x faster than the current mainline, despite the fact that Thusnelda currently encodes all blocks (on average, mainline codes less than half) and does not make as extensive use of assembly as the mainline.
The majority of the time and effort spent on the Thusnelda encoder since last update involves complete replacement of both the motion estimation and mode selection code.
Theora playback frame with motion vector and macroblock debugging 'telemetry' active. The telemetry feature allows libtheora to write debugging information directly into the video frame output.
The decoder included with Thusnelda includes support to render
debugging information into the output frame. This functionality is
available through the theora_control call; see
decoder_example.c for an illustration.
This patch adds a -theoradopts
command line option to mplayer for enabling debugging
telemetry. The numeric option specifies a bitflag for which
blocktypes to display; the block types are:
0x01 == CODE_INTER_NO_MV 0x02 == CODE_INTRA 0x04 == CODE_INTER_PLUS_MV 0x08 == CODE_INTER_LAST_MV 0x10 == CODE_INTER_PRIOR_LAST 0x20 == CODE_USING_GOLDEN 0x40 == CODE_GOLDEN_MV 0x80 == CODE_INTER_FOURMVFor example, mplayer -theoradopts vismv=0xff file.ogg would display all motion vectors, coded and implicit. mplayer -theoradopts vismv=0xff:vismbmode=0xdc file.ogg would display all blocks with a motion vector.
Mike Smith has provided a similar patch for GStreamer
The primary intent of the MV-specific work was to increase speed. Motion estimation in the mainline Theora encoder accounts for half or more of encoding time.
The previous generation of motion extimation techniques concentrated on optimizing exhaustive searches of the solution space. Because an exhaustive search is slow, these encoders typically performed a partial search first, and if the results of the partial search failed to reach a particaulr gain threshold, the encoder then performed an exhaustive search.
This strategy has the obvious drawback of inefficiency (either the answer is incomplete or expensize) and has a second less obvious drawback. Encoders rely heavily on reusing previously coded motion vectors (to save bits by not recoding a close match). An exhaustive search has no predisposition to prefer previously found vectors, or to prefer 'close matches'. more cost effective.
'Zonal Search' algorithms base motion vector searches on refinements of motion vectors from previous frames as well as neighboring blocks from the current frame. The new zonal search algorithm in Thusnelda always performs a complete search, but currently has a worst case speed roughly equal to Theora's best case. Thusnelda is roughly ten times faster than Theora's worst case. An additional optimization (see future work below) could potentially double performance again.
A quick example of old and new MV coding appears below. The 'embedded telemetry' versions of the videos are recoded from the originals with the debugging output included as part of the encoded file so that folks who don't want to install the vis patches just to enjoy this little demo don't need to. They're much larger that the originals, mainly to preserve the clarity of the debugging symbols.
Left: example of motion vectors in a frame of video from mainline Theora encode. The original video is here, and a version with embedded telemetry is here.
Left: example of motion vectors in a frame of video from the Thusnelda encoder. The original video is here, and a version with embedded telemetry is here.
Optimization of block type selection (mode select) was primarily to increase coding efficiency. The new MV and mode selection code alone, compared directly to the algorithm used by mainline Theora, currently improves bitrate coding efficiency approximately 10%.
Left: example of mode selection in a frame of video from mainline Theora encode. The original video is here, and a version with embedded telemetry is here.
Left: example of mode selection in a frame of video from mainline Theora encode. The encoder in question here has been modified to code all blocks and disable ZeroBin, such that the only differences we're comparing between mainline Theora and Thusnelda are the motion and mode code. The original video is here, and a version with embedded telemetry is here.
Left: example of mode selection in a frame of video from the Thusnelda encoder. The original video is here, and a version with embedded telemetry is here.
Above: The new motion estimation code incorrectly chooses motion vectors following the frame boundary when the scene is flowing into the frame from the edge.