For music encoding Opus has already been shown to out-perform other audio codecs at both 64 kb/s and 96 kb/s. We originally thought that 64 kb/s was near the lowest bitrate at which Opus could be useful for streaming stereo music. However, with variable bitrate (VBR) improvements in Opus 1.1, suddenly 48 kb/s became a realistic target. Opus 1.2 continues on the path to lowering the bitrate limit. Music at 48 kb/s is now quite usable and while the artefacts are generally audible, they are rarely annoying. Even more, we've actually been pushing all the way to fullband stereo at just 32 kb/s!
Most of the music encoding quality improvements in 1.2 don't come from big new features (like tonality analysis that got added to version 1.1), but from many small changes that all add up. The process so far has mostly been along these lines:
This is how we got some adjustments to the bit allocation trim, an improved tonality analysis that now has better frequency resolution (while taking less CPU!), as well as quality improvements on signals with a few very powerful tones.
In other cases, we just found better ways to optimize encoding on all signals. This is the case for the improved stereo search in 1.2. When using mid-side stereo, the Opus encoder needs to compute a stereo width parameter, quantize it, and encode it to the bit-stream. Rather than quantizing to the closest value, the 1.2 encoder will now (only at higher complexity settings) actually try the two closest values and pick whichever minimizes distortion. It's not a huge gain, but when you add many of those, they add up to a significant improvement.
One change that does make a large difference all by itself is the low bitrate VBR changes. In previous versions (up to 1.1.x), the VBR code has always been conservative about low bitrates. The reasoning was that when you have so few bits, you can't afford to further reduce the bitrate in some sections just so you can improve more demanding sections. After lots of experiments, that reasoning was proven wrong and now the Opus 1.2 encoder makes full use of VBR even down to 32 kb/s.
Of course, you don't need to take our word for it. Here's a comparison between Opus versions 1.0, 1.1, and 1.2 so you can hear for yourself how the quality has improved and how Opus now sounds in general. As an anchor (OK, and also to make us look good!), we've also included MP3 samples.
Select where to start playing when selecting a new sample
Opus 1.2 also pushes the boundary further when it comes to speech encoding. It brings many improvements to the SILK encoder, many of which actually make it simpler at the same time. The most noticeable speech quality improvements however come from tuning made to the hybrid mode. Hybrid mode is when SILK is used to encode speech frequencies up to 8 kHz while CELT is used to encode the remaining frequencies, from 8 to 20 kHz. It is one of the main reasons Opus is better than the sum of its parts. Through most of the Opus development, hybrid mode has been used mostly at bitrates around 32 kb/s, so there were always plenty of bits for the CELT layer. But for 1.2 we're pushing hybrid mode fullband speech coding all the way down to 16 kb/s. At that bitrate, every single bit counts, so we have to optimize the CELT layer to do a good job with very few bits. We also need to make sure that it gets just the right number of bits, since we don't want to starve the SILK layer which encodes the most important part of the speech.
The CELT encoder has multiple psychoacoustic tools it can use to maximize audio quality. The decisions on how to use them has so far been mostly tuned for music where CELT is used at all frequencies, and not for hybrid where it is only used for a few frequency bands. Version 1.2 adds hybrid-specific tuning for both spreading and time-frequency resolution switching. It also completely disables the use of the allocation trim, which can use many bits while not being very useful for just a few bands. All these improvements allow the Opus encoder to switch to fullband at a lower rate than it originally did. In 1.0, the encoder would only start coding speech in fullband mode at 29 kb/s. That threshold got reduced to 21 kb/s in version 1.1, and now Opus will actually use fullband starting at only 14 kb/s.
Again, don't just take our word for it, but also listen to some speech samples. Here's another quality comparison between 1.0, 1.1, and 1.2 — this time for speech. This time, the let's-try-to-look-good anchor is the venerable Speex coder (version 1.2). This also shows free codecs have come a long way since 2003.
Select where to start playing when selecting a new sample
Opus 1.2 has seen a large number of speed-related improvements. Despite many of the quality improvements listed above requiring extra CPU, Opus 1.2 is faster than any previous release. This is the result of a large number of optimizations that were merged during 1.2 development. Those include generic simplifications, as well as x86-specific (SSEx) and ARM-specific (mostly Neon) optimizations. Since those are based on run-time CPU detection, they are safe to enable at build-time (which is the default) even for older CPUs. See below how the overall speed improved compared 1.0 and 1.1. Note that the few regressions are due to the cost of the new quality improvement changes enabled at high complexity settings. Also, the absence of a bar for version 1.0 for music at complexity 9 is due to that version not implementing any additional features above complexity 5.
Opus is now widely deployed on a large range of platforms and devices (including Android, iOS, and all major browsers) and is exposed to untrusted data. Because of that, it's critical to ensure no malicious audio file or malicious VoIP packet can cause any damage to the receiver. Opus has had all kind of random packet tests since before version 1.0, but in the past year, we have increased the amount and variety of testing. We have been testing extensively with the gcc address and undefined-behaviour sanitizers, fuzz-testing with American Fuzzy Lop, and recently joined the Google OSS-Fuzz project. While no major security issues were found, we discovered and fixed several minor issues, improving the code quality in the process, especially for fixed-point. We have also been adopting the Core Infrastructure Initiative best practices. See the badge below for details:
Standard documents are never perfect and over time, minor errors always get discovered — and fixed. Opus is no exception, and over the last 5 years since it became a standard, a few issues have been found. Most are minor bugs in the reference decoder that do not affect the test vectors that define strict compliance with the Opus specification. Those issues were fixed in the stable Opus versions (1.0.x, 1.1.x and now 1.2) as they were found. However, the fixes for two of those issues do affect the test vectors, and thus strict compliance with the standard (RFC 6716). This is why an IETF draft will soon update the standard. Since the document is not yet officially adopted by the IETF, the corresponding minor fixes are not yet enabled by default. They can be enabled by adding the --enable-update-draft option to the configure script when compiling. That being said, this is not a huge deal since the cases where the fixes make a difference are relatively rare and they do not cause any compatibility issues. Below is a description of the two fixes that are enabled by --enable-update-draft.
When originally designing Opus, we were amazed that we were able to encode fullband (48 kHz) speech at just 32 kb/s. We thought with some tuning we could get down to 28 kb/s or maybe even 24 kb/s, but we never considered going as low as 16 kb/s. Yet here we are 5 years later and 16 kb/s fullband speech is becoming viable. This is the result of tracking down small artefacts one by one and making them less audible. In one case though, the artefact was due the fact at these extremely low rates, there is only enough bits to code frequencies up to 9.6 kHz, which prevents the Opus folding feature from working and causes noise to be used instead. Since the noise has the correct energy, it is often inaudible... but sometimes it is audible. Fortunately, we found a way to "fix" folding in a way that causes no compatibility issue. Decoders that have the fix will sound slightly better at low bitrate, but the older decoders will not suffer. Right now the change is only enabled with --enable-update-draft, but it will becomes the default when the IETF update to Opus becomes an RFC.
When the bitrate becomes too low to directly code the stereo image using mid-side (MS) stereo, Opus falls back to a technique called intensity stereo (IS) at higher frequencies. This means that the left and right channels in a certain frequency band are derived from the same signal, but with different energy. To slightly improve the quality of IS, the format has a way to optionally make the two channels the inverse of each other. This slightly improves quality when the two channels are very different (inversely correlated). The only drawback of that feature is that when one downmixes a file that uses IS to mono, the left and right channels can sometimes partially cancel each other, which can be annoying. The Opus encoder now has an option to disable this feature during the encoding process, at the cost of a slight degradation in quality at low bitrate. A better option is for the decoder to optionally ignore the inversion flags when it knows that the signal will be downmixed to mono. That way those using stereo still get the benefits of the feature and those downmixing don't get penalized. Since a decoder ignoring these flags is not technically compatible with the current state of RFC 6716, the feature is only enabled with --enable-update-draft until the the IETF update to Opus becomes an RFC.
Opus 1.2 also includes a few more features and improvements that don't fit in any of the above sections. Here they are:
The Opus specification supports packet sizes up to 120 ms, but until recently the encoder could produce packets up to 60 ms long. Generating 120-ms packets would require using the Opus repacketizer to concatenate two 60-ms packets. The 1.2 encoder now adds support for encoding packets of 80, 100, and 120 ms. Those packets don't have better quality than smaller packets — in fact they are slightly worse. Their only advantage is for voice-over-IP at extremely low bitrates because longer packets means fewer packets, and lower overhead from the IP/UDP/RTP headers. Most people will never need it, but in regions with poor, low-bitrate connectivity, this can be very useful to support since it can make the difference between poor latency communication and no communication at all.
Opus defaults to variable bitrate (VBR), but for a variety of reasons some people prefer to use constant bitrate (CBR), which it has always supported. Opus 1.2 includes some improvements to CBR quality, especially for low bitrate speech. Also improved is forward-error correction (FEC), which is useful in conditions of high packet loss. In version 1.2, FEC can now operate at lower bitrates than it used to because the encoder can choose to reduce the audio bandwidth to enable FEC in high-loss conditions. Also, the hybrid mode bit allocation is now aware of FEC and will properly take it into account when dividing the bits between SILK and CELT. This makes it possible to use FEC in fullband mode at bitrates as low as 24 kb/s.
There is ongoing work at IETF to define an ambisonics mapping for Opus at IETF. That new mapping isn't formally standardized yet, but Opus 1.2 currently supports the latest version of the draft for direct coding of ambisonics channels (mapping family 2), with the matrix-based coding method (mapping family 3) still in development.
Ambisonics is a technique for representing spatial audio based on spherical harmonics rather than relying on fixed loudspeaker locations (like 5.1 surround). It was never very popular compared to other formats for applications like movies. So why care about Opus ambisonics? The answer is virtual reality (VR). With VR we no longer have fixed speakers. What we want is a representation of the entire sound field that can then be converted to a stereo signal based on the correct orientation of the user's head. It turns out that representing sound fields is something that Ambisonics is good at. So expect to see Opus getting more popular for VR.
For Windows users, the main change in version 1.2 is that the build process now produces a single library (opus.lib) rather than multiple libraries for SILK, CELT, and so on. This should make it easier to deploy applications based on Opus.—Jean-Marc Valin (firstname.lastname@example.org) June 20, 2017