Page 62 - Intermediate Statistics for Dummies
P. 62
06_045206 ch02.qxd 2/1/07 9:42 AM Page 41
Chapter 2: Sorting through Statistical Techniques
Don’t try to predict y for x-values that fall outside the range of where the data
was collected; you have no guarantee that the line still works outside of that
range, or that it will even make sense. For the golf example, you can’t say that
0 = 39.6
if x (the number of putts) = 0 the total score would be 39.6 + 1.52
*
(unless you just call it good after your ball hits the green). This mistake is
called extrapolation.
You can discover more about simple linear regression, and expansions on it,
in Chapters 4 and 5.
Avoiding Bias
Bias is the bane of a statistician’s existence; it’s easy to create and very hard
to deal with, if not impossible in most situations. The statistical definition of
bias is the systematic overestimation or underestimation of the actual value.
In language the rest of us can understand, it means that the results are always 41
off by a certain amount in a certain direction. For example, a bathroom scale
may always report a weight that’s five pounds more than it should be (I’m
convinced this is true of my doctor’s office scale); this consistent adding of
five points to every outcome represents a systematic overestimation of the
actual weight.
The most important idea when dealing with bias is prevention, or at least
minimizing it. Bias is like weeds in a garden: After they’re present, they’re
very hard to deal with, and it’s always better to eliminate them from the start.
In this section, you see ways bias can creep into a data set, or even into a sta-
tistic, and what you can do about it.
Looking at bias through statistical glasses
Bias can show up in a data set a variety of different ways. Here are some of
the most common ways bias can creep into your data:
Selecting the sample from the population: Bias occurs when you leave
some intended groups out of the process, and/or give certain groups too
much weight.
For example, TV surveys (the ones where they ask you to phone in
your opinion) are biased because no one has selected a prior sample of
people to represent the population — people call in on their own. When
people participate in a survey on their own, they’re more likely to have
stronger opinions than those who don’t choose to participate. Such sam-
ples are called self-selected samples and are typically very biased.