Page 303 - Statistics for Dummies
P. 303
Chapter 18: Looking for Links: Correlation and Regression
Never do a regression analysis unless you have already found at least a mod-
erately strong correlation between the two variables. (My rule of thumb is it
should be at or beyond either positive or negative 0.50, but other statisticians
may have different criteria.) I’ve seen cases where researchers go ahead and
make predictions when a correlation is as low as 0.20! By anyone’s standards,
that doesn’t make sense. If the data don’t resemble a line to begin with, you
shouldn’t try to use a line to fit the data and make predictions (but people
still try).
Figuring out which variable
is X and which is Y
Before moving forward to find the equation for your regression line, you have
to identify which of your two variables is X and which is Y. When doing cor-
relations (as I explain earlier in this chapter), the choice of which variable is X
and which is Y doesn’t matter, as long as you’re consistent for all the data. 287
But when fitting lines and making predictions, the choice of X and Y does
make a difference.
So how do you determine which variable is which? In general, Y is the vari-
able that you want to predict, and X is the variable you are using to make that
prediction. In the earlier cricket chirps example, you are using the number of
chirps to predict the temperature. So in this case the variable Y is the tem-
perature, and the variable X is the number of chirps. Hence Y can be predicted
by X using the equation of a line if a strong enough linear relationship exists.
Statisticians call the X-variable (cricket chirps in my earlier example) the
explanatory variable, because if X changes, the slope tells you (or explains)
how much Y is expected to change in response. Therefore, the Y variable is
called the response variable. Other names for X and Y include the independent
and dependent variables, respectively.
Checking the conditions
In the case of two numerical variables, you can come up with a line that
enables you to predict Y from X, if (and only if) the following two conditions
from the previous sections are met:
✓ The scatterplot must form a linear pattern.
✓ The correlation, r, is moderate to strong (typically beyond 0.50 or –0.50).
Some researchers actually don’t check these conditions before making pre-
dictions. Their claims are not valid unless the two conditions are met.
3/25/11 8:13 PM
26_9780470911082-ch18.indd 287 3/25/11 8:13 PM
26_9780470911082-ch18.indd 287