Timothy B. Terriberry
The kth central moment of a (univariate, real) random variable X is defined as
|µk = E[(X - E[X])k] ,|
if the expectation exists. The second central moment µ2 = σ2, the population variance. For an empirical distribution, these moments are usually computed in two passes. The first pass computes the mean,
|µ = E[X] = (1/N)||∑||xi ,|
where N is the size of the population, and the second pass computes
|µk = (1/N)||∑||(xi - µ)k .|
The two-pass algorithm is numerically stable, even when, for example, µ is large and µk is small. For k = 2, one can apply a well-known correction factor due to Åke Björck [CGL83]:
|µ2 = (1/N)||∑||(xi - µ)2 - (µ1)2 .|
By definition, µ1 = 0 when evaluated with exact arithmetic, but it may be non-zero with finite precision arithmetic.
It is often useful to be able to compute the mean and variance of a population in a single pass. For example, it may be easy to generate a large random sample of a distribution, but prohibitively expensive to store all those samples, or difficult to retain the state required to compute the same set of samples multiple times. Or, one may wish to compute a running total as new data is collected. Alternatively, one may be interested in computing such statistics in parallel, where communication costs (memory references) dominate1. Chan et al. give updating formulas for computing the mean and variance of a set X given the means and variances of any partition of X into two sets XA and XB [CGL79]:
|δ||= µB - µA|
|Μ2||= Μ2A + Μ2B + δ2·NANB/N ,|
where µ2 = Μ2/N . These update formulas allow a pairwise summation algorithm that builds up sets of equal size and merges them to compute µ and σ2 in a single pass using only O(log N) storage. This reduces the relative error in the mean from the O(N) obtained by a naive algorithm to O(log N), and the variance shows a similar improvement. Alternatively, one can take NB=1, and obtain a constant time updating algorithm with O(1) storage, though this sacrifices some of the nice numerical properties.
Although these formulas have been known for some time, we are unaware of a reference for similar formulas for higher-order moments. The third and fourth central moments are needed, for example, to compute the skewnewss and kurtosis of a distribution, which are used, e.g., in the Jarque-Bera test to estimate the probability that a distribution is Gaussian. The skewness, S, and the kurtosis, K, are given by
|K||= µ4/σ4 .|
The corresponding pairwise update formulas for µ3 and µ4 are given by
|Μ3||= Μ3A + Μ3B + δ3·NANB(NA - NB)/N2 + 3(NAΜ2B - NBΜ2A)δ/N|
|Μ4||= Μ4A + Μ4B + δ4·NANB((NA)2 - NANB+(NB)2)/N3 + 6((NA)2Μ2B + (NB)2Μ2A)δ2/N2 + 4(NAΜ3B - NBΜ3A)δ/N ,|
where again we define µk = Μk/N . These formulas are quite complex compared to the deceptively simple formulas for Μ2, though this complexity can be reduced considerably by assuming NA = NB or NB = 1 . Although one can see a number of patterns emerging, no recurrence for general k is known.
1In fact, my original use for them was computing the zero-mean normalized cross correlation between two images on a GPU, where the per-processor memory bandwidth is extremely small and only single-precision floating point arithmetic is available. They have come up several times since then, however.
|Comments or questions? Send them to firstname.lastname@example.org||Created 9 Dec. 2007, last updated 15 Oct. 2008|