]>
The process of chirp estimation fits a set of modulated sinusoids to a given audio signal. We pass in initial chirp parameter estimates and the estimator then improves these approximations to a desired level of fit precision. The parameters describing a chirp are the zeroth-order parameters P (phase) and A (amplitude), the first-order parameters W (frequency) and dA (amplitude modulation), and the second-order parameters dW (chirp rate or frequency modulation) and ddA (amplitude modulation squared).
At present, Ghost research uses a specific chirp estimation algorithm we published in 2007, and we'll likely continue using and improving this algorithm until something clearly better comes along. For this reason, it's essential we know what convergence and error to expect from it.
As alluded to in demo 2, the original 2007 paper had little space to devote to detailed algorithm characterization. The few graphs covered specific disjoint cases that were primarily concerned with efficiency as compared to other techniques. This data does not lead to a comprehensive, intuitive undestanding of how the chirp estimation performs given varied input data and operating parameters.
In this update, we concern ourselves with documenting the expected behavior of the chirp estimator, as well as what to expect from several optional improvements alluded to but left unexplored by the 2007 paper. Because chirp estimation is of general use in the field of signal processing, we do not focus only on expected use cases for Ghost. This documentation also serves as a benchmark archive for testing. Future improvments and optimizations can be compared against these known good results.
The reference chirp estimator algorithm as well as code used to generate the graphs can be found in Xiph.Org's SVN at http://svn.xiph.org/trunk/chirptest. The module builds a standalone C application (using libcairo for drawing) named 'chirp' that when run produces all the graphs seen here. The application requires several hours of run time on a 32 core Opteron server to generate the graphs as seen here, so you probably don't want to try it on a netbook.
Except where noted, graphs are generated using the algorithms as described in the paper linked above. The paper contains an algebraic sign error on Page 11 in Algorithm 2; the 12th line has a '+' which should be a minus. Specifically it should read:
The algorithms are implemented in single precision float arithmetic with the exception of the instantaneous phase calculation in the basis chirp generation. A naive calculation such as cosf(P+(W+dW*i)*i) runs out of bits in the mantissa of the single-precision argument to cosf() as W and i increase. For the sake of simplicity and clarity, the naive calculation is used but implemented and passed in double precision.
The termination criteria for convergence is a simple magnitude threshold. The algorithm is declared to have converged when the sum of the squares of the basis projections for each chirp are all less than 1e-12. This value is chosen to be large enough to sit comfortably above numerical noise (between 1e-16 and 1e-14), but small enough to show a substantial amount of the algorithm's depth of precision. The convergence graphs plot the number of iterations required before the final uncounted iteration results in a change below the termination threshold. The error graphs give the worst-case performance for our specific termination criteria across all values of any swept input.
Keep in mind that 'convergence' simply means that the estimation algorithm arrived at a stable solution and halted. It does not mean that the solution is correct or sufficiently accurate. Consult both the convergence and error graphs to determine the trustworthy domain of operation.
The paper offers two variants of the chirp estimation algorithm, linear and non-linear. The linear and non-linear algorithms differ primarily in that the non-linear algorithm 'recenters' the W parameter (frequency) each iteration, that is, it regenerates the basis functions with an updated W estimate each iteration as well as restarting the residual error calculation. Basis generation is a relatively expensive operation, and the nonlinear estimator repeats basis generation for each chirp in each iteration. Thus, although the conceptual difference between the linear and non-linear algorithms is small, the computational cost is potentially substantial.
The non-linear algorithm in the paper does not recenter the dW (chirp rate) parameter. In addition, it also assumes an initial dW estimate of 0, thus avoiding the dW term in basis generation completely:
This is an intentional optimization to reduce the basis generation and projection complexity; if unchirped, the basis functions show symmetry/antisymmetry about their centers.
We recenter the dW parameter by modifying the basis functions:
This basis modification also requires modifying the dW update step in Algorithm 2:
Using and recentering the dW parameter has benefits, improving both convergence range and dW accuracy. We will document the estimator performance both of 'partial' non-linear operation (recentering only W with an unchirped basis as described in the paper) as well as 'full' non-linear operation (recentering both W and dW with chirped basis functions).
The graphs below show behavior of the linear, partial-nonlinear and full-nonlinear estimators across sets of single-chirp input signals. Estimation was performed with a sine input window and other parameters as in the original 2007 paper.
Near DC, convergence and error behavior most strongly depend on chirp center frequency (W); the following graphs show behavior from DC extending to a few bins above DC. In addition, the phase of an input chirp affects behavior at the edge of convergence, so all graphs show the worst-case behavior run over a swept-phase set of inputs. Estimator behavior is exactly anti-symmetric about the center of the spectrum.
As expected, the linear algorithm shows reliable convergence behavior but as it does not restart the basis functions or error estimates, error increases steadily above and below Y=0 where the input W estimate already matches the chirp W before iteration. Both the partial- and full-nonlinear algorithms show reduced convergence range but high precision across the domain of frequency convergence making the total domain of useful output much larger than the linear estimator. This comes at the cost of a more expensive algorithm.
The linear algorithm shows dW (chirp rate) convergence and error behavior similar to frequency (W) behavior.
The partial-nonlinear algorithm shows error behavior similar to the linear algorithm once there's a significant dW (chirp) component to the input signal. This is not surprising as the partial-nonlinear algorithm is not recentering the basis functions for dW changes and the calculated residual error is also an approximation computed from the basis functions.
The full-nonlinear algorithm shows vastly improved dW behavior similar to W behavior. It gives accurate results across the nearly-complete domain of convergence, which is greatly expanded over the partial-nonlinear algorithm. With the proper window, we can in fact expand the domain of convergence to an arbitrarily large portion of the solution space. More about this later under the section on windowing functions.
A chirp's actual W does not strongly influence the convergence or accuracy of the algorithm once it is more than 5-10 bins above DC (or below Nyquist). W can still affect numerical noise; depending on how the basis and reconstruction chirps are generated, an approach such as cos(W*t) shows a linear loss of precision as |t| or W increases. As such, single-precision basis/reconstruction chirp generation can show noticably more noise near Nyquist than at DC.
FFT techniques tend not to be particularly sensitive to exact window choice; there's a set of mostly interchangable 'good' windows that give similar performance for most uses. Unless you have a very specific need, the Hanning window covers 95% of what any DSP engineer would ever need for use with Fourier transforms. This is not the case with the chirp estimator. The window function used for input and basis windowing exerts considerable influence on algorithm accuracy and convergence behavior.
This all points to the possibility that a unimodal window without sidelobes could render the nonlinear estimator's solution space substantially convex for single-chirps and indeed this turns out to be the case. The W and dW terms can be made to converge from initial estimates arbitrarily far away from the correct value. This comes at the cost of typically slower convergence (unimodal windows tend to be broader and roll-off more slowly) and some reduction in convergence domain near DC.
Window functions affect linear estimation less profoundly than non-linear estimation. W and dW are not being recentered and as such can't become stuck in local maxima caused by sidelobe leakage. Nonetheless, we expect a window function to influence overall error behavior.
In the above graphs we do indeed see improvement in both absolute error performance and convergence speed depending on window selection. The useful range of linear estimation can be enhanced by at least a factor of two with an appropriate choice of window.
Non-linear estimation is quite sensitive to the window function in use as the W and dW parameters track along the frequency domain energy contours created by the window's mainlobe and sidelobe. Where the linear estimator is ~guaranteed to converge (once we're away from DC/Nyquist), we've already noticed that the nonlinear estimator does not have an inherently convex solution space. A unimodal window, however, can render the solution space similarly convex.
Also of interest are optimized windows that attempt to deliver wider convergence without loss of convergence speed. Machine optimization has shown a few promising window research directions.
NOTE: the scales on the nonlinear graphs cover a considerably wider range than on the linear graphs.
The paper states that second-order parameter fits (dW and ddA) are optional to both the linear and nonlinear algorithms. The nonlinear algorithms are also capable of fitting only one of the second order parameters (while clamping the other to zero), such as we've been doing in the nonlinear graphs above, where we fit dW but not ddA.
(Strictly speaking, the same is true of the first-order parameters; we could choose to fit W and not dA for example. However, because of the alternating even/odd orthogonality of the basis functions, fitting only W and not dA has almost no effect on algorithm behavior. This is visible in the first-order nonlinear graphs below. The same is not true of fitting dW but not ddA.)
The chirp estimation algorithm does not use complex math; the four (or six) basis functions used by chirp estimation are all real sequences. However, because the basis functions are quadrature (sine/cosine) pairs, they behave in some ways similarly to the real/imaginary halves of a complex basis.
The linear estimator calculates basis energies once at the beginning of fitting. The nonlinear estimators must recalculate basis and basis energy in each iteration.
If we treat the sine/cosine basis pairs as if they were a single complex basis, the energy of the three complex basis functions is described by:
Rather than normalizing each of the six real basis projections by the basis energy, we instead normalize by half the 'complex' basis energy. This approximation results in basis energy being determined solely by the window and blocksize, and we can both simplify and move the calculation out of the nonlinear loop.
Symmetric normalization approximately halves the number of adds and multiplies needed during the basis projection step of the nonlinear algorithms, as normalization energy is not explicitly calculated. That said, the time spent generating new, recentered basis functions each iteration typically swamps the cost of the actual basis projection onto the error vector. As such the time saved by symmetric normalization may be insignificant, depending on the relative expense of basis generation.
The linear algorithm generates basis functions only once and so must compute normalization factors only once. Symmetric normalization may become useful in the event that the linear algorithm is run repeatedly for only a few iterations each time, thus increasing the relative amount of time the linear algorithm would spend calculating normalization factors during initialization.
Page 10 of the 2007 paper discusses the quantity 'alpha' (α) which is described as an update rate used to scale the update added to the W parameter each iteration:
The paper presents a graph that shows the optimal value for alpha is about 1.0. The exact relationship between W convergence speed and alpha is slightly more complex, but 1.0 does hold up as a good value for alpha upon closer inspection. The optimal value of alpha is affected somewhat by window choice.
In the fully nonlinear estimator, a similar update scaling coefficient can be applied to the dW update quantity:
Adjusting the dW alpha parameter (that is, ) affects speed of convergence for both W and dW.
The original W alpha value is affected by window choice, but the dW alpha parameter is much more so. The graphs make it clear that the optimal alpha for dW updates depends strongly on window and that we can do much better than a value of 1.0.
The following graphs replot the near-DC behavior for several windows as presented earlier, but selects a dW alpha appropriate for the window to show the effect on convergence speed and error behavior.
The effect of noise on estimation algorithm behavior is mostly uninteresting. As with the FFT, larger block sizes decrease the relative contribution of noise. Noise energy contribution to the fit parameters is governed primarily by the equivalent noise bandwidth of the window in use.
The graphs below illustrate that noise has little effect on convergence speed and none on stability, and contributes energy to fit parameters as predicted by window characteristics. As with all the graphs here, the plots below use a blocksize of only 256 samples.
Two behaviors mostly determine the ability of the chirp estimation algorithm to discriminate and individually resolve closely spaced peaks. The specifics of both are determined by the choice of window function.
Neither the input data chirps nor the basis functions used by the chirp estimator are orthogonal to one another; they are merely usually 'mostly orthogonal'. The farther apart chirps (and chirp estimates) are, the less chirps and fit estimates interact.
Fit accuracy of closely spaced sinusoids and chirps is limited by the effective orthogonality. As with the FFT, the wider the mainlobe of the window in use, the worse the 'in-close' frequency discrimination.
In order to demonstrate orthogonality limits in isolation, the graphs above show an approximately best case situation using an accurate initial estimate (exact W estimate, all other parameters randomized within +/-.1 of correct value and amplitude within +/-.1dB). Additional estimate error does not affect orthogonality limits or behavior, but it can bring main/sidelobe capture into play.
The chirp estimator is essentially a gradient-climbing [non-]linear solver; it walks up the hill of an objective function until it reaches the top. Just like in the FFT, the energy leakage from a strong peak (both mainlobe and sidelobe) can swamp weaker peaks in the immediate vicinity, either moving the apparent maximum or disguising it completely.
The chirp estimator both fits peaks in order of estimated energy and subtracts already-estimated energy from the fitting process. This reduces the apparent energy leakage and unmasks weaker genuine peaks in the same immediate vicinity. The chirp estimator is thus potentially better able to discriminate closely spaced peaks than the FFT when using the same window function.
However, this extra discrimination gain is conditional on a good enough initial estimate. In the nonlinear algorithms, a poor initial estimate allows strong peaks to 'capture' weaker peaks in the first few iterations by deflecting the W and dW estimates of weaker peaks too far toward the stronger peak. Recentering causes them to collapse into the stronger peak. The linear algorithm does not recenter the W (or dW) parameter, and so it does not exhibit main/sidelobe capture.
Though the quality of the initial estimate does not, for the most part, affect the absolute error of the final nonlinear fit, we can see in the graphs below that there is an 'event horizon' beyond which the energy leakage of a strong peak captures the initial estimate of the weaker peak, causing it to converge to the wrong chirp. This event horizon is at the point where leakage energy from a stronger peak is approximately as strong as the weaker peak itself.
The graphs above illustrate sidelobe capture resulting from a poor initial estimate. Each plots the fit of an unchirped sinusoid in the immediate vicinity of a stronger reference sinusoid. Both sinusoids are fit using exact initial estimates but with amplitudes of zero. The weaker sinusoid's estimate is placed first in the list so that without amplitude estimates it is fit first in the initial iteration. As such, the plots show the limits of main/sidelobe capture when the weaker sinusoid is captured by the energy leakage of the stronger sinusoid.
We now graph fit behavior of realistic initial estimates, approximately intended to match the level of accuracy we could reasonably achieve by estimating parameters from an initial FFT. The full-nonlinear algorithm is substantially unaffected by additional estimate uncertainty so long as we avoid main/sidelobe capture. The partial-nonlinear estimate behaves as well so long as the dW estimate is accurate. The linear algorithm requires both an accurate W and dW parameter.
The first graphs plot two unchirped sinusoids, so we see similar excellent behavior from the full- and partial-nonlinear algorithms. The error in the W estimate degrades the linear algorithm's performance.
The graphs above use randomized estimates with A (amplitude) within 6dB, W (frequency) within .5 FFT bins and P (phase) within .1 radians. For purposes of visual presentation, W estimate error is randomized away from the neighboring chirp; this keeps apparent mainlobe width the same. A estimate error is also randomized away from the neighboring chirp's value in order to guarantee that the amplitude sort order remains correct in all trials.
We note that the 'realistic' estimate shows some main/sidelobe capture due to remaining estimate error from the stronger sinusoid. In the region outside of capture, we see that the fit error and convergence behavior are largely unaffected by the additional initial estimate error as expected.
Using the same estimate bounds parameters, the graphs below illustrate the interaction of frequency (W) and chirp rate (dW) when discriminating a test chirp from a central reference sinusoid. Note that no initial estimate is given for the chirp rate (it is zero) as an FFT cannot be used to estimate chirp rate directly.
As expected, we see continued excellent error performance from the full-nonlinear algorithm. The partial-nonlinear algorithm cannot recenter dW and thus shows degraded performance due to the initial dW estimate of zero. The linear algorithm is doubly hurt by the error in both W and dW.
First and foremost, we finally have a fairly extensive characterization of the behavior we can expect from our chirp estimator. This characterization comes with benchmarking code useful for both optimization and regression testing.
Most of the conclusions from the graphs are nuanced and numeric; they are better presented in visual form than in a text summary. However, a few particularly interesting concepts are worth repeating.
As always, happy hacking!