Page 96 - Statistics II for Dummies
P. 96
80 Part II: Using Different Types of Regression to Make Predictions
If the collected data was the result of a well-designed experiment that controls
for possible confounding variables, you can establish a cause-and-effect rela-
tionship between x and y if they’re strongly correlated. Otherwise, you can’t
establish such a relationship. (See your Stats I text or Statistics For Dummies
for info regarding experiments.)
Extrapolation: The ultimate no-no
Plugging values of x into the model that fall outside of the reasonable
boundaries of x is called extrapolation. And one of my colleagues sums up
this idea very well: “Friends don’t let friends extrapolate.”
When you determine a best-fitting line for your data, you come up with an
equation that allows you to plug in a value for x and get a predicted value for
y. In algebra, if you find the equation of a line and graph it, the line typically
has an arrow on each end indicating it goes on forever in either direction. But
that doesn’t work for statistical problems (because statistics represents the
real world). When you’re dealing with real-world units like height, weight, IQ,
GPA, house prices, and the weight of your statistics textbook, only certain
numbers make sense.
So the first point is, don’t plug in values for x that don’t make any sense.
For example, if you’re estimating the price of a house (y) by using its square
footage (x), you wouldn’t think of plugging in a value of x like 10 square feet
or 100 square feet, because houses simply aren’t that small.
You also wouldn’t think about plugging in values like 1,000,000 square feet
for x (unless your “house” is the Ohio State football stadium or something).
It wouldn’t make sense. Likewise, if you’re estimating tomorrow’s tempera-
ture using today’s temperature, negative numbers for x could possibly make
sense, but if you’re estimating the amount of precipitation tomorrow given
the amount of precipitation today, negative numbers for x (or y for that
matter) don’t make sense.
Choose only reasonable values of x for which you try to make estimates
about y — that is, look at the values of x for which your data was collected,
and stay within those bounds when making predictions. In the textbook-
weight example, the smallest average student weight is 48.5 pounds, and
the largest average student weight is 142 pounds. Choosing student weights
between 48.5 and 142 to plug in for x in the equation is okay, but choosing
values less than 48.5 or more than 142 isn’t a good idea. You can’t guarantee
that the same linear relationship (or any linear relationship for that matter)
continues outside the given boundaries.
09_466469-ch04.indd 80 7/24/09 10:20:40 AM