Banner
This is an overview of the LPCNet algorithm. The left part of the network (yellow) is computed once per frame and its result is held constant throughout the frame for the sample rate network on the right (blue). The compute prediction block predicts the sample at time t based on previous samples and on the linear prediction coefficients.

Introducing LPCNet

Neural speech synthesis models have recently demonstrated the ability to synthesize high quality speech for text-to-speech and compression applications. These new models often require powerful GPUs to achieve real-time operation, so being able to reduce their complexity would open the way for many new applications. We propose LPCNet, a WaveRNN variant that combines linear prediction with recurrent neural networks to significantly improve the efficiency of speech synthesis. We demonstrate that LPCNet can achieve significantly higher quality than WaveRNN for the same network size and that high quality LPCNet speech synthesis is achievable with a complexity under 3 GFLOPS. This makes it easier to deploy neural synthesis applications on lower-power devices, such as embedded systems and mobile phones.

MUSHRA Test Results

We conducted a subjective listening test with a MUSHRA-derived methodology, where 8 utterances (2 male and 2 female speakers) were each evaluated by 100 participants. The results below show that the quality of LPCNet significantly exceeds that of WaveRNN at equal complexity. Alternatively, it shows that the same quality is possible at a significantly reduced complexity.

MUSHRA Results
Subjective quality (MUSHRA) results as a function of the number of units in the main GRU.

Hear For Yourself

Here are two of the samples that were used in the listening test above.

Select sample

Select algorithm

Select where to start playing when selecting a new sample

Player will continue when changing sample.

Comparing the speech synthesis quality of LPCNet with that of WaveRNN+. This demo will work best with a browser that supports Ogg/Opus in HTML5 (Firefox, Chrome and Opera do), but if Opus support is missing the file will be played as FLAC, WAV, or high bitrate MP3.

Source Code

The LPCNet source code is available under the BSD license.

Updated: The C version of the LPCNet122 model achieves real-time synthesis with 15-20% CPU on a single core of a recent x86 CPU. It also achieves real-time synthesis on a single core of an iPhone 6s.

—Jean-Marc Valin (jmvalin@jmvalin.ca) October 29, 2018

Additional Resources

  1. J.-M. Valin, J. Skoglund, LPCNet: Improving Neural Speech Synthesis Through Linear Prediction, Submitted for ICASSP 2019.
  2. A. van den Oord and S. Dieleman and H. Zen and K. Simonyan and O. Vinyals and A. Graves and N. Kalchbrenner and A. Senior and K. Kavukcuoglu, WaveNet: A Generative Model for Raw Audio, 2016.
  3. Kalchbrenner, N. and Elsen, E. and Simonyan, K. and Noury, S. and Casagrande, N. and Lockhart, E. and Stimberg, F. and van den Oord, A. and Dieleman, S. and Kavukcuoglu, K., Efficient Neural Audio Synthesis, 2018.
  4. Kleijn, W. B. and Lim, F. SC and Luebs, A. and Skoglund, J. and Stimberg, F. and Wang, Q. and Walters, T. C., Wavenet based low rate speech coding, 2018
  5. Join our development discussion in #opus at irc.freenode.net (→web interface)

Jean-Marc's Opus documentation work is sponsored by the Mozilla Corporation.
(C) Copyright 2018 Mozilla and Xiph.Org
Mozilla