Page 107 - Statistics II for Dummies
P. 107
Chapter 5: Multiple Regression with Two X Variables 91
✓ If you can reject Ho based on your data, you conclude that the
correlation isn’t equal to zero, so the variables are related. More than
that, their relationship is deemed to be statistically significant — that is,
the relationship would occur very rarely in your sample just by chance.
Any statistical software package can calculate a hypothesis test of a correlation
for you. The actual formulas used in that process are beyond the scope
of this book. However the interpretation is the same as for any test: If the
p-value is smaller than your predetermined value of α (typically 0.05), reject
Ho and conclude x and y are related. Otherwise you can’t reject Ho, and you
conclude you don’t have enough evidence to indicate that the variables are
related.
In Minitab, you can conduct a hypothesis test for a correlation by clicking
on Stat>Basic Statistics>Correlation, and checking the Display p-values box.
Choose the variables you want to find correlations for, and click Select. You’ll
get output in the form of a little table that shows the correlations between the
variables for each pair with the respective p-values under each one. You can
see the correlation output for the ads and sales example in Figure 5-2.
Looking at Figure 5-2, the correlation of 0.791 between TV ads and sales has a
p-value of 0.000, which means it’s actually less than 0.001. That’s a highly
significant result, much less than 0.05 (your predetermined α level). So TV ad
spending is strongly related to sales. The correlation between newspaper
ad spending and sales was 0.594, which is also found to be statistically significant
with a p-value of 0.004.
Checking for Multicolinearity
You have one more very important step to complete in the relationship-
exploration process before going on to using the multiple regression model.
You need to complete step four: looking at the relationship between the x
variables themselves and checking for redundancy. Failure to do so can lead
to problems during the model-fitting process.
Multicolinearity is a term you use if two x variables are highly correlated. Not
only is it redundant to include both related variables in the multiple regression
model, but it’s also problematic. The bottom line is this: If two x variables are
significantly correlated, only include one of them in the regression model, not
both. If you include both, the computer won’t know what numbers to give as
coefficients for each of the two variables because they share their contribution
to determining the value of y. Multicolinearity can really mess up the model-
fitting process and give answers that are inconsistent and often not repeatable
in subsequent studies.
10_466469-ch05.indd 91 7/24/09 9:32:33 AM