Page 138 - Statistics for Dummies
P. 138

122
                                         Part II: Number-Crunching Basics
                                                                           Best Actress Winners with Ages
                                                      Table 7-2
                                                                                Designated as Outliers

                                                             Name
                                                                                          Movie
                                                      Year
                                                                                   Age
                                                      1967
                                                                                          Guess Who’s Coming to Dinner
                                                             Katharine Hepburn
                                                                                   60
                                                             Katharine Hepburn
                                                      1968
                                                                                          The Lion in Winter
                                                                                   61
                                                                                   61
                                                             Geraldine Page
                                                      1985
                                                                                          Trip to Bountiful
                                                                                          The Queen
                                                                                   61
                                                             Helen Mirren
                                                      2006
                                                             Marie Dressler
                                                                                          Min and Bill
                                                      1931
                                                                                   62
                                                                                   74
                                                             Katharine Hepburn
                                                      1981
                                                                                          On Golden Pond
                                                             Jessica Tandy
                                                                                          Driving Miss Daisy
                                                      1989
                                                                                   81
                                                    The youngest of the outliers is 60 years old (Katharine Hepburn, 1967). Just
                                                    to compare, the next youngest age in the data set is 49 (Susan Sarandon,
                                                    1995). This indicates a clear break in this data set.
                                                    Making mistakes when interpreting a boxplot
                                                    It’s a common mistake to associate the size of the box in a boxplot with the
                                                    amount of data in the data set. Remember that each of the four sections
                                                    shown in the boxplot contains an equal percentage (25%) of the data; the
                                                    boxplot just marks off the places in the data set that separate those sections.
                                                    In particular, if the median splits the box into two unequal parts, the larger
                                                    part contains data that’s more variable than the other part, in terms of its
                                                    range of values. However, there is still the same amount of data (25%) in the
                                                    larger part of the box as there is in the smaller part.
                                                    Another common error involves sample size. A boxplot is a one-dimensional
                                                    graph with only one axis representing the variable being measured. There is
                                                    no second axis that tells you how many data points are in each group. So if
                                                    you see two boxplots side-by-side and one of them has a very long box and
                                                    the other has a very short one, don’t conclude that the longer one has more
                                                    data in it. The length of the box represents the variability in the data, not the
                                                    number of data values.
                                                    When viewing or making a boxplot, always make sure the sample size (n) is
                                                    included as part of the title. You can’t figure out the sample size otherwise.
                                                                                                                           3/25/11   8:16 PM
                             12_9780470911082-ch07.indd   122                                                              3/25/11   8:16 PM
                             12_9780470911082-ch07.indd   122
   133   134   135   136   137   138   139   140   141   142   143