Page 47 - Statistics II for Dummies
P. 47

Chapter 2: Finding the Right Analysis for the Job  31


                                Total score = 39.6 + 1.52 * Number of putts

                                So if you have 35 putts in an 18-hole golf course, your total score is predicted
                                to be about 39.6 + 1.52 * 35 = 92.8, or 93. (Not bad for 18 holes!)

                                Don’t try to predict y for x-values that fall outside the range of where the data
                                was collected; you have no guarantee that the line still works outside of that
                                range or that it will even make sense. For the golf example, you can’t say that
                                if x (the number of putts) = 1 the total score would be 39.6 + 1.52 * 1 = 41.12
                                (unless you just call it good after your ball hits the green). This mistake is
                                called extrapolation.

                                You can discover more about simple linear regression, and expansions on it,
                                in Chapters 4 and 5.



                      Avoiding Bias


                                Bias is the bane of a statistician’s existence; it’s easy to create and very hard
                                (if not impossible) to deal with in most situations. The statistical definition
                                of bias is the systematic overestimation or underestimation of the actual
                                value. In language the rest of us can understand, it means that the results are
                                always off by a certain amount in a certain direction.

                                For example, a bathroom scale may always report a weight that’s five pounds
                                more than it should be (I’m convinced this is true of the scale at my doctor’s
                                office).

                                Bias can show up in a data set in a variety of different ways. Here are some of
                                the most common ways bias can creep into your data:

                                  ✓ Selecting the sample from the population: Bias occurs when you either
                                    leave some groups out of the process that should have been included,
                                    or give certain groups too much weight.
                                     For example, TV surveys that ask viewers to phone in their opinion are
                                    biased because no one has selected a prior sample of people to repre-
                                    sent the population — viewers who want to be involved select them-
                                    selves to participate by calling in on their own. Statisticians have found
                                    that folks who decide to participate in “call-in” or Web site polls are very
                                    likely to have stronger opinions than those who have been randomly
                                    selected but choose not to get involved in such polls. Such samples are
                                    called self-selected samples and are typically very biased.














          06_466469-ch02.indd   31                                                                    7/24/09   9:31:39 AM
   42   43   44   45   46   47   48   49   50   51   52