Page 128 - Statistics for Dummies
P. 128
112
Part II: Number-Crunching Basics
✓ If the mean is much larger than the median, the data are generally
skewed right; a few values are larger than the rest.
✓ If the mean is much smaller than the median, the data are generally
skewed left; a few smaller values bring the mean down.
✓ If the mean and median are close, you know the data is fairly balanced,
or symmetric, on each side.
Under certain conditions, you can put together the mean and standard devia-
tion to describe a data set in quite a bit of detail. If the data have a normal
distribution (a bell-shaped hill in the middle, sloping down at the same rate
on each side; see Chapter 5), the Empirical Rule can be applied.
The Empirical Rule (also in Chapter 5) says that if the data have a normal dis-
tribution, about 68% of the data lie within 1 standard deviation of the mean,
about 95% of the data lie within 2 standard deviations from the mean, and
99.7% of the data lie within 3 standard deviations of the mean. These percent-
ages are custom-made for the normal distribution (bell-shaped data) only and
can’t be used for data sets of other shapes.
Detecting misleading histograms
There are no hard and fast rules for how to create a histogram; the person
making the graph gets to choose the groupings on the x-axis as well as the
scale and starting and ending points on the y-axis. Just because there is an
element of choice, however, doesn’t mean every choice is appropriate; in
fact, a histogram can be made to be misleading in many ways. In the following
sections, you see examples of misleading histograms and how to spot them.
Missing the mark with too few groups
Although the number of groups you use for a histogram is up to the discre-
tion of the person making the graph, there is such a thing as going overboard,
either by having way too few bars, with everything lumped together, or by
having way too many bars, where every little difference is magnified.
To decide how many bars a histogram should have, I take a good look at the
groupings used to form the bars on the x-axis and see if they make sense.
For example, it doesn’t make sense to talk about exam scores in groups of 2
points; that’s too much detail — too many bars. On the other hand, it doesn’t
make sense to group actresses’ ages by intervals of 20 years; that’s not
descriptive enough.
Figures 7-4 and 7-5 illustrate this point. Each histogram summarizes n = 222
observations of the amount of time between eruptions of the Old Faithful
geyser in Yellowstone Park. Figure 7-4 uses six bars that group the data by
3/25/11 8:16 PM
12_9780470911082-ch07.indd 112 3/25/11 8:16 PM
12_9780470911082-ch07.indd 112