Page 126 - Statistics for Dummies
P. 126

110
                                         Part II: Number-Crunching Basics
                                                    represent clumps of data that are close together; a flat histogram shows data
                                                    equally dispersed, with more variability.
                                                    Variability in a histogram is higher when the taller bars are more spread out

                                                    around the mean and lower when the taller bars are close to the mean.
                                                    For the Best Actress Award winners’ ages shown in Figure 7-1, you see many
                                                    actresses are in the age range from 30–35, and most of the ages are between
                                                    20–50 years in age, which is quite diverse; then you have those outliers, those
                                                    few older actresses (I count 7 of them) that spread the data out farther,
                                                    increasing its overall variability.
                                                    The most common statistic used to measure variability in a data set is the
                                                    standard deviation, which in a rough sense measures the average distance
                                                    that the data lie from the mean. The standard deviation for the Best Actress
                                                    age data is 11.35 years. (See Chapter 5 for all the details on standard devia-
                                                    tion.) A standard deviation of 11.35 years is fairly large in the context of this
                                                    problem, but the standard deviation is based on average distance from the
                                                    mean, and the mean is influenced by outliers, so the standard deviation will
                                                    be as well (see Chapter 5 for more information).
                                                    In the later section “Interpreting a boxplot,” I discuss another measure of
                                                    variability, called the interquartile range (IQR), which is a more appropriate
                                                    measure of variability when you have skewed data.
                                                    Putting numbers with pictures
                                                    You can’t actually calculate measures of center and variability from the his-
                                                    togram itself because you don’t know the exact data values. To add detail to
                                                    your findings, you should always calculate the basic statistics of center and
                                                    variation along with your histogram. (All the descriptive statistics you need,
                                                    and then some, appear in Chapter 5.)
                                                    Figure 7-1 is a histogram for the Best Actress ages; you can see it is skewed
                                                    right. Then for Figure 7-3, I calculated some basic (that is, descriptive) statis-
                                                    tics from the data set. Examining these numbers, you find the median age is
                                                    33.00 years and the mean age is 35.69 years.
                                                    The mean age is higher than the median age because of a few actresses that
                                                    were quite a bit older than the rest when they won their awards. For exam-
                                                    ple, Jessica Tandy won for her role in Driving Miss Daisy when she was 81,
                                                    and Katharine Hepburn won the Oscar for On Golden Pond when she was 74.
                                                    The relationship between the median and mean confirms the skewness (to
                                                    the right) found in Figure 7-1.










                                                                                                                           3/25/11   8:16 PM
                             12_9780470911082-ch07.indd   110                                                              3/25/11   8:16 PM
                             12_9780470911082-ch07.indd   110
   121   122   123   124   125   126   127   128   129   130   131