Page 99 - Statistics for Dummies
P. 99
Chapter 5: Means, Medians, and More
Adding another standard deviation on either side of the mean increases
the percentage from 68 to 95, which is a big jump and gives a good idea of
where “most” of the data are located. Most researchers stay with the 95%
range (rather than 99.7%) for reporting their results, because increasing the
range to 3 standard deviations on either side of the mean (rather than just 2)
doesn’t seem worthwhile, just to pick up that last 4.7% of the values.
The Empirical Rule tells you about what percentage of values are within a cer-
tain range of the mean, and I need to stress the word about. These results are
approximations only, and they only apply if the data follow a normal distribu-
tion. However, the Empirical Rule is an important result in statistics because
the concept of “going out about two standard deviations to get about 95% of
the values” is one that you see mentioned often with confidence intervals and
hypothesis tests (see Chapters 13 and 14).
Here’s an example of using the Empirical Rule to better describe a popula-
tion whose values have a normal distribution: In a study of how people make
friends in cyberspace using newsgroups, the age of the users of an Internet 83
newsgroup was reported to have a mean of 31.65 years, with a standard devi-
ation of 8.61 years. Suppose the data were graphed using a histogram and
were found to have a bell-shaped curve similar to what’s shown in Figure 5-2.
According to the Empirical Rule, about 68% of the newsgroup users had ages
within 1 standard deviation (8.61 years) of the mean (31.65 years). So about
68% of the users were between ages 31.65 – 8.61 years and 31.65 + 8.61 years,
or between 23.04 and 40.26 years. About 95% of the newsgroup users were
between the ages of 31.65 – 2(8.61), and 31.65 + 2(8.61), or between 14.43 and
48.87 years. Finally, about 99.7% of the newsgroup users’ ages were between
31.65 – 3(8.61) and 31.65 + 3(8.61), or between 5.82 and 57.48 years.
This application of the rule gives you a much better idea about what’s hap-
pening in this data set than just looking at the mean, doesn’t it? As you
can see, the mean and standard deviation used together add value to your
results; plugging these values into the Empirical Rule allows you to report
ranges for “most” of the data yourself.
Remember, the condition for being able to use the Empirical Rule is that the
data have a normal distribution. If that’s not the case (or if you don’t know
what the shape actually is), you can’t use it. To describe your data in these
cases, you can use percentiles, which represent certain cutoff points in the
data (see the later section “Gathering a five-number summary”).
3/25/11 8:17 PM
10_9780470911082-ch05.indd 83 3/25/11 8:17 PM
10_9780470911082-ch05.indd 83