Page 95 - Statistics II for Dummies
P. 95
Chapter 4: Getting in Line with Simple Linear Regression 79
Knowing the Limitations of
Your Regression Analysis
The bottom line of any data analysis is to make the correct conclusions given
your results. When you’re working with a simple linear regression model,
there’s the potential to make three major errors. This section shows you
those errors and tells you how to avoid them.
Avoiding slipping into cause-and-effect mode
In a simple linear regression, you investigate whether x is related to y, and if
you get a strong correlation and a scatterplot that shows a linear trend, then
you find the best-fitting line and use it to estimate the value of y for reasonable
values of x.
There’s a fine line, however (no pun intended), that you don’t want to cross
with your interpretation of regression results. Be careful to not automatically
interpret slope in a cause-and-effect mode when you’re using the regression
line to estimate the value of y using x. Doing so can result in a leap of faith that
can send you into the frying pan. Unless you have used a controlled experi-
ment to get the data, you can only assume that the variables are correlated;
you can’t really give a stone-cold guarantee about why they’re related.
In the textbook-weight example, you estimate the average weight of the
students’ textbooks by using the students’ average weight, but that doesn’t
mean increasing a particular child’s weight causes his textbook weight to
increase. For example, because of the strong positive correlation, you do
know that students with lower weights are associated with lower total
textbook weights, and students with higher weights tend to have higher
textbook weights. But you can’t take one particular third-grade student,
increase his weight, and presto — suddenly his textbooks weigh more.
The variable underlying the relationship between a child’s weight and the
weight of his backpack is the grade level of the student from an academic
standpoint; as grade level increases, so might the size and number of his books,
as well as the homework coming home. Student grade level drives both student
weight and textbook weight. In this situation, student grade level is what
statisticians call a lurking variable; it’s a variable that wasn’t included in the
model but is related to both the outcome and the response. A lurking vari-
able confuses the issue of what’s causing what to happen.
09_466469-ch04.indd 79 7/24/09 10:20:40 AM