Page 96 - Statistics for Dummies
P. 96

80
                                         Part II: Number-Crunching Basics
                                                    For example if someone told you that the average starting salary for someone
                                                    working at Company Statistix is $70,000, you may think, “Wow! That’s great.”
                                                    But if the standard deviation for starting salaries at Company Statistix is
                                                    $20,000, that’s a lot of variation in terms of how much money you can make,
                                                    so the average starting salary of $70,000 isn’t as informative in the end, is it?
                                                    On the other hand, if the standard deviation was only $5,000, you would have
                                                    a much better idea of what to expect for a starting salary at that company.
                                                    Which is more appealing? That’s a decision each person has to make; how-
                                                    ever it’ll be a much more informed decision once you realize standard devia-
                                                    tion matters.
                                                    Without the standard deviation, you can’t compare two data sets effectively.
                                                    Suppose two sets of data have the same average; does that mean that the
                                                    data sets must be exactly the same? Not at all. For example, the data sets 199,
                                                    200, 201; and 0, 200, 400 both have the same average (200) yet they have very
                                                    different standard deviations. The first data set has a very small standard
                                                    deviation (s=1) compared to the second data set (s=200).
                                                    References to the standard deviation may become more commonplace in the
                                                    media as more and more people (like you, for example) discover what the stan-
                                                    dard deviation can tell them about a set of results and start asking for it. In your
                                                    career, you are likely to see the standard deviation reported and used as well.
                                                    Being out of range
                                                    The range is another statistic that some folks use to measure diversity in a
                                                    data set. The range is the largest value in the data set minus the smallest value
                                                    in the data set. It’s easy to find; all you do is put the numbers in order (from
                                                    smallest to largest) and do a quick subtraction. Maybe that’s why the range is
                                                    used so often; it certainly isn’t because of its interpretative value.
                                                    The range of a data set is almost meaningless. It depends on only two num-
                                                    bers in the data set, both of which may reflect extreme values (outliers). My
                                                    advice is to ignore the range and find the standard deviation, which is a more
                                                    informative measure of the variation in the data set because it involves all
                                                    the values. Or you can also calculate another statistic called the interquartile
                                                    range, which is similar to the range with an important difference — it elimi-
                                                    nates outlier and skewness issues by only looking at the middle 50% of the
                                                    data and finding the range for those values. The section “Exploring interquar-
                                                    tile range” at the end of this chapter gives you more details.












                                                                                                                           3/25/11   8:17 PM
                             10_9780470911082-ch05.indd   80                                                               3/25/11   8:17 PM
                             10_9780470911082-ch05.indd   80
   91   92   93   94   95   96   97   98   99   100   101