Page 126 - Statistics for Dummies
P. 126
110
Part II: Number-Crunching Basics
represent clumps of data that are close together; a flat histogram shows data
equally dispersed, with more variability.
Variability in a histogram is higher when the taller bars are more spread out
around the mean and lower when the taller bars are close to the mean.
For the Best Actress Award winners’ ages shown in Figure 7-1, you see many
actresses are in the age range from 30–35, and most of the ages are between
20–50 years in age, which is quite diverse; then you have those outliers, those
few older actresses (I count 7 of them) that spread the data out farther,
increasing its overall variability.
The most common statistic used to measure variability in a data set is the
standard deviation, which in a rough sense measures the average distance
that the data lie from the mean. The standard deviation for the Best Actress
age data is 11.35 years. (See Chapter 5 for all the details on standard devia-
tion.) A standard deviation of 11.35 years is fairly large in the context of this
problem, but the standard deviation is based on average distance from the
mean, and the mean is influenced by outliers, so the standard deviation will
be as well (see Chapter 5 for more information).
In the later section “Interpreting a boxplot,” I discuss another measure of
variability, called the interquartile range (IQR), which is a more appropriate
measure of variability when you have skewed data.
Putting numbers with pictures
You can’t actually calculate measures of center and variability from the his-
togram itself because you don’t know the exact data values. To add detail to
your findings, you should always calculate the basic statistics of center and
variation along with your histogram. (All the descriptive statistics you need,
and then some, appear in Chapter 5.)
Figure 7-1 is a histogram for the Best Actress ages; you can see it is skewed
right. Then for Figure 7-3, I calculated some basic (that is, descriptive) statis-
tics from the data set. Examining these numbers, you find the median age is
33.00 years and the mean age is 35.69 years.
The mean age is higher than the median age because of a few actresses that
were quite a bit older than the rest when they won their awards. For exam-
ple, Jessica Tandy won for her role in Driving Miss Daisy when she was 81,
and Katharine Hepburn won the Oscar for On Golden Pond when she was 74.
The relationship between the median and mean confirms the skewness (to
the right) found in Figure 7-1.
3/25/11 8:16 PM
12_9780470911082-ch07.indd 110 3/25/11 8:16 PM
12_9780470911082-ch07.indd 110