Page 47 - Statistics II for Dummies
P. 47
Chapter 2: Finding the Right Analysis for the Job 31
Total score = 39.6 + 1.52 * Number of putts
So if you have 35 putts in an 18-hole golf course, your total score is predicted
to be about 39.6 + 1.52 * 35 = 92.8, or 93. (Not bad for 18 holes!)
Don’t try to predict y for x-values that fall outside the range of where the data
was collected; you have no guarantee that the line still works outside of that
range or that it will even make sense. For the golf example, you can’t say that
if x (the number of putts) = 1 the total score would be 39.6 + 1.52 * 1 = 41.12
(unless you just call it good after your ball hits the green). This mistake is
called extrapolation.
You can discover more about simple linear regression, and expansions on it,
in Chapters 4 and 5.
Avoiding Bias
Bias is the bane of a statistician’s existence; it’s easy to create and very hard
(if not impossible) to deal with in most situations. The statistical definition
of bias is the systematic overestimation or underestimation of the actual
value. In language the rest of us can understand, it means that the results are
always off by a certain amount in a certain direction.
For example, a bathroom scale may always report a weight that’s five pounds
more than it should be (I’m convinced this is true of the scale at my doctor’s
office).
Bias can show up in a data set in a variety of different ways. Here are some of
the most common ways bias can creep into your data:
✓ Selecting the sample from the population: Bias occurs when you either
leave some groups out of the process that should have been included,
or give certain groups too much weight.
For example, TV surveys that ask viewers to phone in their opinion are
biased because no one has selected a prior sample of people to repre-
sent the population — viewers who want to be involved select them-
selves to participate by calling in on their own. Statisticians have found
that folks who decide to participate in “call-in” or Web site polls are very
likely to have stronger opinions than those who have been randomly
selected but choose not to get involved in such polls. Such samples are
called self-selected samples and are typically very biased.
06_466469-ch02.indd 31 7/24/09 9:31:39 AM