Page 117 - Intermediate Statistics for Dummies
P. 117
10_045206 ch05.qxd 2/1/07 9:49 AM Page 96
96
Part II: Making Predictions by Using Regression
To head off the problem of multicolinearity, along with the correlations you
examine regarding each x variable and the response variable y, also find the
correlations between all pairs of x variables. If two x variables are highly cor-
related, don’t leave them both in the model, or multicolinearity will result. To
see the correlations between all the x variables, have Minitab calculate a cor-
relation matrix of all the variables (see the section “Finding and interpreting
correlations”). You can ignore the correlations between the y variable and the
x variables and only choose the correlations between the x variables shown
in the correlation matrix. Find those correlations by intersecting the rows
and columns of the x variables for which you want correlations.
If two x variables x 1 and x 2 are strongly correlated (that is their correlation is
beyond +0.7 or –0.7), then one of them would do just about as good a job of
estimating y as the other, so you don’t need to include them both in the model.
Now if x 1 and x 2 aren’t strongly correlated, then both of them working together
would do a better job of estimating sales than either variable alone. For the ad
spending example, you have to examine the correlation between the two x
variables, TV ad spending and newspaper ad spending, to be sure no multi-
colinearity is present. The correlation between these two variables (as you can
see in Figure 5-2) is only 0.058. You don’t even need a hypothesis test to tell you
whether or not these two variables are related; they’re clearly not. However, if
you want to know, the p-value for the correlation between the spending for the
two ad types is 0.799 (see Figure 5-2), which is much, much larger than 0.05
ever thought of being and therefore not statistically significant.
The large p-value for the correlation between spending for the two ad types
confirms your thoughts that both variables together may be helpful in esti-
mating y because each makes its own contribution. It also tells you that
keeping them both in the model will not create any multicolinearity prob-
lems. (This completes step four of the multiple regression analysis, as listed
in the “Stepping through the analysis” section.)
Finding the Best-Fitting Model
After you have a group of x variables that are all related to y and not related
to each other (see previous sections), you’re ready to perform step five of the
multiple regression analysis (as listed in the “Stepping through the analysis”
section). That is, you’re ready to find the best-fitting model that fits the data.
In the multiple regression model with two x variables, you have the general
equation y = b 0 + b 1 x 1 + b 2 x 2 , and you already know which x variables to
include in the model (by doing step four); the task now is to figure out which