Page 128 - Statistics for Dummies
P. 128

112
                                         Part II: Number-Crunching Basics
                                                     ✓ If the mean is much larger than the median, the data are generally
                                                        skewed right; a few values are larger than the rest.
                                                     ✓ If the mean is much smaller than the median, the data are generally
                                                        skewed left; a few smaller values bring the mean down.
                                                     ✓ If the mean and median are close, you know the data is fairly balanced,
                                                        or symmetric, on each side.
                                                    Under certain conditions, you can put together the mean and standard devia-
                                                    tion to describe a data set in quite a bit of detail. If the data have a normal
                                                    distribution (a bell-shaped hill in the middle, sloping down at the same rate
                                                    on each side; see Chapter 5), the Empirical Rule can be applied.
                                                    The Empirical Rule (also in Chapter 5) says that if the data have a normal dis-
                                                    tribution, about 68% of the data lie within 1 standard deviation of the mean,
                                                    about 95% of the data lie within 2 standard deviations from the mean, and
                                                    99.7% of the data lie within 3 standard deviations of the mean. These percent-
                                                    ages are custom-made for the normal distribution (bell-shaped data) only and
                                                    can’t be used for data sets of other shapes.
                                                    Detecting misleading histograms
                                                    There are no hard and fast rules for how to create a histogram; the person
                                                    making the graph gets to choose the groupings on the x-axis as well as the
                                                    scale and starting and ending points on the y-axis. Just because there is an
                                                    element of choice, however, doesn’t mean every choice is appropriate; in
                                                    fact, a histogram can be made to be misleading in many ways. In the following
                                                    sections, you see examples of misleading histograms and how to spot them.
                                                    Missing the mark with too few groups
                                                    Although the number of groups you use for a histogram is up to the discre-
                                                    tion of the person making the graph, there is such a thing as going overboard,
                                                    either by having way too few bars, with everything lumped together, or by
                                                    having way too many bars, where every little difference is magnified.
                                                    To decide how many bars a histogram should have, I take a good look at the
                                                    groupings used to form the bars on the x-axis and see if they make sense.
                                                    For example, it doesn’t make sense to talk about exam scores in groups of 2
                                                    points; that’s too much detail — too many bars. On the other hand, it doesn’t
                                                    make sense to group actresses’ ages by intervals of 20 years; that’s not
                                                    descriptive enough.
                                                    Figures 7-4 and 7-5 illustrate this point. Each histogram summarizes n = 222
                                                    observations of the amount of time between eruptions of the Old Faithful
                                                    geyser in Yellowstone Park. Figure 7-4 uses six bars that group the data by









                                                                                                                           3/25/11   8:16 PM
                             12_9780470911082-ch07.indd   112                                                              3/25/11   8:16 PM
                             12_9780470911082-ch07.indd   112
   123   124   125   126   127   128   129   130   131   132   133