Page 116 - Intermediate Statistics for Dummies
P. 116
10_045206 ch05.qxd 2/1/07 9:49 AM Page 95
Chapter 5: When Two Variables Are Better than One: Multiple Regression
The letter ρ is the Greek version of r and represents the true correlation of x
and y in the entire population; r is the correlation coefficient of the sample.
Any statistical software package can calculate a hypothesis test of a correla-
tion for you. The actual formulas used in that process are beyond the scope
of this book. However the interpretation is the same as for any test: If the p-
value is smaller than your prespecified value of α (typically 0.05), reject Ho
and conclude x and y are related. Otherwise you can’t reject Ho, and you con-
clude you don’t have enough evidence that the variables are related.
In Minitab, you can conduct a hypothesis test for a correlation by clicking
on Stat>Basic Statistics>Correlation, and checking the Display p-values box.
Choose the variables you want to find correlations for, and click Select. You’ll
get output that is in the form of a little table that shows the correlations
between the variables for each pair with the respective p-values under each
one. You can see the correlation output for the ads and sales example in
Figure 5-2.
Looking at Figure 5-2, the correlation of 0.791 between TV ads and sales has a 95
p-value of 0.000, which means it’s actually less than 0.001. That’s a highly sig-
nificant result, much less than 0.05 (your predetermined α level). So TV ad
spending is strongly related to sales. The correlation between newspaper ad
spending and sales was 0.594, which is also found to be statistically signifi-
cant with a p-value of 0.004.
Checking for Multicolinearity
You have one more very important step to complete in the relationship-
exploration process before going on to using the multiple regression model.
That is, you need to complete step four: looking at the relationship between
the x variables themselves and checking for redundancy. Failure to do so can
lead to problems during the model-fitting process.
Multicolinearity is a term you use if two x variables are highly correlated. Not
only is it redundant to include both related variables in the multiple regres-
sion model, but it’s also problematic. The bottom line is this: If two x vari-
ables are significantly correlated, only include one of them in the regression
model, not both. If you include both, the computer won’t know what numbers
to give as coefficients for each of the two variables, because they share their
contribution to determining the value of y. Multicolinearity can really mess
up the model-fitting process and give answers that are inconsistent and often-
times not repeatable in subsequent studies.