Page 62 - Intermediate Statistics for Dummies
P. 62

06_045206 ch02.qxd  2/1/07  9:42 AM  Page 41
                                                                       Chapter 2: Sorting through Statistical Techniques
                                                    Don’t try to predict y for x-values that fall outside the range of where the data
                                                    was collected; you have no guarantee that the line still works outside of that
                                                    range, or that it will even make sense. For the golf example, you can’t say that
                                                                                                             0 = 39.6
                                                    if x (the number of putts) = 0 the total score would be 39.6 + 1.52
                                                                                                            *
                                                    (unless you just call it good after your ball hits the green). This mistake is
                                                    called extrapolation.
                                                    You can discover more about simple linear regression, and expansions on it,
                                                    in Chapters 4 and 5.
                                         Avoiding Bias
                                                    Bias is the bane of a statistician’s existence; it’s easy to create and very hard
                                                    to deal with, if not impossible in most situations. The statistical definition of
                                                    bias is the systematic overestimation or underestimation of the actual value.
                                                    In language the rest of us can understand, it means that the results are always  41
                                                    off by a certain amount in a certain direction. For example, a bathroom scale
                                                    may always report a weight that’s five pounds more than it should be (I’m
                                                    convinced this is true of my doctor’s office scale); this consistent adding of
                                                    five points to every outcome represents a systematic overestimation of the
                                                    actual weight.
                                                    The most important idea when dealing with bias is prevention, or at least
                                                    minimizing it. Bias is like weeds in a garden: After they’re present, they’re
                                                    very hard to deal with, and it’s always better to eliminate them from the start.
                                                    In this section, you see ways bias can creep into a data set, or even into a sta-
                                                    tistic, and what you can do about it.
                                                    Looking at bias through statistical glasses
                                                    Bias can show up in a data set a variety of different ways. Here are some of
                                                    the most common ways bias can creep into your data:
                                                       Selecting the sample from the population: Bias occurs when you leave
                                                        some intended groups out of the process, and/or give certain groups too
                                                        much weight.
                                                        For example, TV surveys (the ones where they ask you to phone in
                                                        your opinion) are biased because no one has selected a prior sample of
                                                        people to represent the population — people call in on their own. When
                                                        people participate in a survey on their own, they’re more likely to have
                                                        stronger opinions than those who don’t choose to participate. Such sam-
                                                        ples are called self-selected samples and are typically very biased.
   57   58   59   60   61   62   63   64   65   66   67