Standard deviation is a statistic parameter that helps to estimate the dispersion of data series. It's usually calculated in two passes: first, you find a mean, and second, you calculate a square deviation of values from the mean:

```
double std_dev1(double a[], int n) {
if(n == 0)
return 0.0;
double sum = 0;
for(int i = 0; i < n; ++i)
sum += a[i];
double mean = sum / n;
double sq_diff_sum = 0;
for(int i = 0; i < n; ++i) {
double diff = a[i] - mean;
sq_diff_sum += diff * diff;
}
double variance = sq_diff_sum / n;
return sqrt(variance);
}
```

But you can do the same thing in one pass. Rewrite the formula in the following way:

```
double std_dev2(double a[], int n) {
if(n == 0)
return 0.0;
double sum = 0;
double sq_sum = 0;
for(int i = 0; i < n; ++i) {
sum += a[i];
sq_sum += a[i] * a[i];
}
double mean = sum / n;
double variance = sq_sum / n - mean * mean;
return sqrt(variance);
}
```

Unfortunately, the result will be inaccurate when the array contains large numbers (see the comments below).

This trick is mentioned in an old Soviet book about programmable calculators (here is the reference for Russian readers: Финк Л. Папа, мама, я и микрокалькулятор. — М.: Радио и связь, 1988).

## 18 comments

Ten recent comments are shown below. Show all comments

It does not require two divisions - only one.

Arne, thank you for your algorithm. I've already seen similar algorithms but I don't know how they are derived. I'd greatly appreciate if you would point me to some source which explains derivation of this and/or similar algorithms. According to some experiments I made for some similar calculations the algorithms like the one you presented here give significantly better results compared to the "one pass, full squares" (which can produce even completely wrong results for "inconvenient" input) but they can still be noticeably less accurate than the real two pass algorithm. Do you know of any source where such error analysis is shown? Thank you.

As documented in previous comments, the error can be large for the one-pass method.

But sometimes the input to sqrt can be negative, resulting in... well, who knows what, given that you're coding in C.

Try a list with 3 copies of 1.4592859018312442e+63 for example.

Thank you for the example. There seems to be a problem with very large values and little or no difference between them. std_dev1 works fine, while std_dev2 gives incorrect results.

Knuth actually has one pass algorithm like this

_M gets you mean, _C / (N-1) is variance

Thanks, Ildar. There's so many gems in Knuth's toolkit.

@Ace, if you're still around, walk thru the algebra carefully, paying attention to the fact that n times the average is definitionally the sum.

http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/variance.pdf

above PDF link is broken. Below link can be used.

http://suave_skola.varak.net/proj2/stddev.pdf

Nice Article. Thanks..

Pushpender,

Thank you for the link.

I tried the method used in "2.2.2 A better formula". Am able to get good results to a great extent. I would like to know the derivation of the formula. Please suggest where I can find it (I tried to check the references mentioned in the paper but it is not there).