Page 99 - Statistics for Dummies
P. 99

Chapter 5: Means, Medians, and More
                                                    Adding another standard deviation on either side of the mean increases
                                                    the percentage from 68 to 95, which is a big jump and gives a good idea of
                                                    where “most” of the data are located. Most researchers stay with the 95%
                                                    range (rather than 99.7%) for reporting their results, because increasing the
                                                    range to 3 standard deviations on either side of the mean (rather than just 2)
                                                    doesn’t seem worthwhile, just to pick up that last 4.7% of the values.
                                                    The Empirical Rule tells you about what percentage of values are within a cer-
                                                    tain range of the mean, and I need to stress the word about. These results are
                                                    approximations only, and they only apply if the data follow a normal distribu-
                                                    tion. However, the Empirical Rule is an important result in statistics because
                                                    the concept of “going out about two standard deviations to get about 95% of
                                                    the values” is one that you see mentioned often with confidence intervals and
                                                    hypothesis tests (see Chapters 13 and 14).
                                                    Here’s an example of using the Empirical Rule to better describe a popula-
                                                    tion whose values have a normal distribution: In a study of how people make
                                                    friends in cyberspace using newsgroups, the age of the users of an Internet   83
                                                    newsgroup was reported to have a mean of 31.65 years, with a standard devi-
                                                    ation of 8.61 years. Suppose the data were graphed using a histogram and
                                                    were found to have a bell-shaped curve similar to what’s shown in Figure 5-2.
                                                    According to the Empirical Rule, about 68% of the newsgroup users had ages
                                                    within 1 standard deviation (8.61 years) of the mean (31.65 years). So about
                                                    68% of the users were between ages 31.65 – 8.61 years and 31.65 + 8.61 years,
                                                    or between 23.04 and 40.26 years. About 95% of the newsgroup users were
                                                    between the ages of 31.65 – 2(8.61), and 31.65 + 2(8.61), or between 14.43 and
                                                    48.87 years. Finally, about 99.7% of the newsgroup users’ ages were between
                                                    31.65 – 3(8.61) and 31.65 + 3(8.61), or between 5.82 and 57.48 years.
                                                    This application of the rule gives you a much better idea about what’s hap-
                                                    pening in this data set than just looking at the mean, doesn’t it? As you
                                                    can see, the mean and standard deviation used together add value to your
                                                    results; plugging these values into the Empirical Rule allows you to report
                                                    ranges for “most” of the data yourself.
                                                    Remember, the condition for being able to use the Empirical Rule is that the
                                                    data have a normal distribution. If that’s not the case (or if you don’t know
                                                    what the shape actually is), you can’t use it. To describe your data in these
                                                    cases, you can use percentiles, which represent certain cutoff points in the
                                                    data (see the later section “Gathering a five-number summary”).












                                                                                                                           3/25/11   8:17 PM
                             10_9780470911082-ch05.indd   83                                                               3/25/11   8:17 PM
                             10_9780470911082-ch05.indd   83
   94   95   96   97   98   99   100   101   102   103   104