Page 64 - Intermediate Statistics for Dummies
P. 64
06_045206 ch02.qxd 2/1/07 9:43 AM Page 43
Chapter 2: Sorting through Statistical Techniques
The problem
Statistics textbooks sometimes show two formulas for the variance of a data
n
2
x i - i
!_
x
i 1
2
=
, where n is the
set. One formula shown for the variance is s =
n
sample size, the values of x are the data values, and the sample mean (or the
n
!
x i
i 1
. This formula for vari-
average of all the values of the data set) is x =
=
n
ance, you may note, contains an n all by itself in the denominator. The fact
that the denominator is n and not n – 1 makes a teacher’s job of explaining
variance a whole lot easier, because it represents the average squared
distance from the mean. In this case, the values being squared are the differ-
ences between the data values and their mean. You get the average of these
squared values by summing them up and dividing by n, the sample size.
However, this version of a formula for variance, as it’s written, is biased. That
means in a statistical sense, you know that in the long term, the results are 43
always off by a very small amount from their target value. If you take repeated
samples, find the variance, and do this over and over, the results on average
are a little smaller than they should be. (Statisticians can prove this, but you
don’t have to worry about that. I’m sure you have better things to do.)
The solution
Because statisticians prefer results being correct to results that can be more
easily explained, they decided to do something about this bias problem in
the formula for the sample variance. A group of stat big wigs figured out that
dividing by n was the problem, and if you divide by n – 1 rather than n, you
can get answers that are right on target. That’s how the following commonly
used formula for sample variance came into being:
n
!_ x i - i
x
2
s = i 1
=
n - 1
Notice that an n – 1 rather than an n is now in the denominator. However,
trying to explain why the formula isn’t dividing by n does tend to open up a
can of worms for statistics professors (and explains why biased statistics are
a topic left for the intermediate-level students, like you!).
Because statistics can be biased too, in terms of the results they create
through their formulas alone, it’s always a good idea to check with a statisti-
cian or someone else in the know whether a particular statistic is unbiased
before you use it.