Page 108 - Statistics II for Dummies
P. 108
92 Part II: Using Different Types of Regression to Make Predictions
To head off the problem of multicolinearity, along with the correlations you
examine regarding each x variable and the response variable y, also find the
correlations between all pairs of x variables. If two x variables are highly
correlated, don’t leave them both in the model, or multicolinearity will result.
To see the correlations between all the x variables, have Minitab calculate
a correlation matrix of all the variables (see the section “Finding and
interpreting correlations”). You can ignore the correlations between the
y variable and the x variables and only choose the correlations between
the x variables shown in the correlation matrix. Find those correlations by
intersecting the rows and columns of the x variables for which you want
correlations.
If two x variables x and x are strongly correlated (that is, their correlation
1 2
is beyond +0.7 or –0.7), then one of them would do just about as good a job
of estimating y as the other, so you don’t need to include them both in the
model. If x and x aren’t strongly correlated, then both of them working
1 2
together would do a better job of estimating sales than either variable alone.
For the ad-spending example, you have to examine the correlation between
the two x variables, TV ad spending and newspaper ad spending, to be sure no
multicolinearity is present. The correlation between these two variables (as
you can see in Figure 5-2) is only 0.058. You don’t even need a hypothesis test
to tell you whether or not these two variables are related; they’re clearly not.
The p-value for the correlation between the spending for the two ad types is
0.799 (see Figure 5-2), which is much, much larger than 0.05 ever thought of
being and therefore isn’t statistically significant. The large p-value for the
correlation between spending for the two ad types confirms your thoughts
that both variables together may be helpful in estimating y because each
makes its own contribution. It also tells you that keeping them both in the
model won’t create any multicolinearity problems. (This completes step four
of the multiple regression analysis, as listed in the “Stepping through the
analysis” section.)
Finding the Best-Fitting Model
for Two x Variables
After you have a group of x variables that are all related to y and not related
to each other (refer to previous sections), you’re ready to perform step five
of the multiple regression analysis (as listed in the “Stepping through the
analysis” section). You’re ready to find the best-fitting model for the data.
In the multiple regression model with two x variables, you have the general
equation y = b + b x + b x , and you already know which x variables to
0 1 1 2 2
include in the model (by doing step four in the previous section); the task
now is to figure out which coefficients (numbers) to put in for b , b , and b ,
0 1 2
10_466469-ch05.indd 92 7/24/09 9:32:34 AM