The kth central moment of a (univariate, real) random variable X is defined as
µ_{k} = E[(X - E[X])^{k}] , |
if the expectation exists. The second central moment µ_{2} = σ^{2}, the population variance. For an empirical distribution, these moments are usually computed in two passes. The first pass computes the mean,
N-1 | ||
µ = E[X] = (1/N) | ∑ | x_{i} , |
i=0 |
where N is the size of the population, and the second pass computes
N-1 | ||
µ_{k} = (1/N) | ∑ | (x_{i} - µ)^{k} . |
i=0 |
The two-pass algorithm is numerically stable, even when, for example, µ is large and µ_{k} is small. For k = 2, one can apply a well-known correction factor due to Åke Björck [CGL83]:
N-1 | ||
µ_{2} = (1/N) | ∑ | (x_{i} - µ)^{2} - (µ_{1})^{2} . |
i=0 |
By definition, µ_{1} = 0 when evaluated with exact arithmetic, but it may be non-zero with finite precision arithmetic.
It is often useful to be able to compute the mean and variance of a population in a single pass. For example, it may be easy to generate a large random sample of a distribution, but prohibitively expensive to store all those samples, or difficult to retain the state required to compute the same set of samples multiple times. Or, one may wish to compute a running total as new data is collected. Alternatively, one may be interested in computing such statistics in parallel, where communication costs (memory references) dominate^{1}. Chan et al. give updating formulas for computing the mean and variance of a set X given the means and variances of any partition of X into two sets X^{A} and X^{B} [CGL79]:
δ | = µ^{B} - µ^{A} |
µ | = µ^{A}+δ·N^{B}/N |
Μ_{2} | = Μ_{2}^{A} + Μ_{2}^{B} + δ^{2}·N^{A}N^{B}/N , |
where µ_{2} = Μ_{2}/N . These update formulas allow a pairwise summation algorithm that builds up sets of equal size and merges them to compute µ and σ^{2} in a single pass using only O(log N) storage. This reduces the relative error in the mean from the O(N) obtained by a naive algorithm to O(log N), and the variance shows a similar improvement. Alternatively, one can take N^{B}=1, and obtain a constant time updating algorithm with O(1) storage, though this sacrifices some of the nice numerical properties.
Although these formulas have been known for some time, we are unaware of a reference for similar formulas for higher-order moments. The third and fourth central moments are needed, for example, to compute the skewnewss and kurtosis of a distribution, which are used, e.g., in the Jarque-Bera test to estimate the probability that a distribution is Gaussian. The skewness, S, and the kurtosis, K, are given by
S | = µ_{3}/σ^{3} |
K | = µ_{4}/σ^{4} . |
The corresponding pairwise update formulas for µ_{3} and µ_{4} are given by
Μ_{3} | = Μ_{3}^{A} + Μ_{3}^{B} + δ^{3}·N^{A}N^{B}(N^{A} - N^{B})/N^{2} + 3(N^{A}Μ_{2}^{B} - N^{B}Μ_{2}^{A})δ/N |
Μ_{4} | = Μ_{4}^{A} + Μ_{4}^{B} + δ^{4}·N^{A}N^{B}((N^{A})^{2} - N^{A}N^{B}+(N^{B})^{2})/N^{3} + 6((N^{A})^{2}Μ_{2}^{B} + (N^{B})^{2}Μ_{2}^{A})δ^{2}/N^{2} + 4(N^{A}Μ_{3}^{B} - N^{B}Μ_{3}^{A})δ/N , |
where again we define µ_{k} = Μ_{k}/N . These formulas are quite complex compared to the deceptively simple formulas for Μ_{2, though this complexity can be reduced considerably by assuming NA = NB or NB = 1 . Although one can see a number of patterns emerging, no recurrence for general k is known. }
^{}1In fact, my original use for them was computing the zero-mean normalized cross correlation between two images on a GPU, where the per-processor memory bandwidth is extremely small and only single-precision floating point arithmetic is available. They have come up several times since then, however.
Comments or questions? Send them to tterribe@xiph.org | Created 9 Dec. 2007, last updated 15 Oct. 2008 |