Page 106 - Intermediate Statistics for Dummies
P. 106
09_045206 ch04.qxd 2/1/07 9:49 AM Page 85
Chapter 4: Getting in Line with Simple Linear Regression
In the textbook weight example, you estimate the average weight of the stu-
dents’ textbooks by using the students’ average weight, but that doesn’t
mean that increasing a particular child’s weight causes his textbook weight
to increase. For example, because of the strong positive correlation, you do
know that students with lower weights are associated with lower total text-
book weights, and students with higher weights tend to have higher textbook
weights. But you can’t take one particular third-grade student, increase his
weight, and presto — suddenly his textbooks weigh more.
The variable that is underlying the relationship between a child’s weight and
the weight of his backpack is the grade level of the student; as grade level
increases, so does the size of his books. Student grade level drives both stu-
dent weight and textbook weight. In this situation, student grade level is what
statisticians call a confounding variable: it’s a variable that wasn’t included in
the study but is related to both the outcome and the response, and the vari-
able confounds or confuses the issue of what is causing what to happen.
If the collected data was the result of a well-designed experiment that con- 85
trols for possible confounding variables, you can establish a cause-and-effect
relationship between x and y if they’re strongly correlated. Otherwise, you
can’t.
Extrapolation: The ultimate no-no
Plugging values of x into the model that fall outside of the reasonable bound-
aries of x is called extrapolation. And one of my colleagues sums up this idea
very well, “Friends don’t let friends extrapolate.”
When you determine a best-fitting line for your data, you come up with an
equation that allows you to plug in a value for x and get a predicted value
for y. In algebra, if you found the equation of a line and graphed it, the line
would typically have an arrow on each end indicating it goes on forever in
either direction. But that doesn’t work for statistical problems (’cause statis-
tics represents the real world). What I mean is that when you’re dealing with
real-world units like height, weight, IQ, GPA, house prices, and the weight of
your statistics textbook, only certain numbers make sense.
So the first point is, don’t plug in values for x that don’t make any sense. For
example, if you’re estimating the price of a house (y), using its square footage
(x), you wouldn’t think of plugging in a value of x like 10 square feet or 100
square feet, because houses simply aren’t that small. You also wouldn’t think
about plugging in values like 1,000,000 square feet for x (unless your “house”
is the Ohio State football stadium or the like). It wouldn’t make sense. If
you’re estimating tomorrow’s temperature using today’s temperature, nega-
tive numbers for x could possibly make sense, but if you’re estimating the
amount of precipitation tomorrow given the amount of precipitation today,
negative numbers for x (or y for that matter) don’t make sense.
@Spy