Page 117 - Intermediate Statistics for Dummies
P. 117

10_045206 ch05.qxd  2/1/07  9:49 AM  Page 96
                                96
                                         Part II: Making Predictions by Using Regression
                                                    To head off the problem of multicolinearity, along with the correlations you
                                                    examine regarding each x variable and the response variable y, also find the
                                                    correlations between all pairs of x variables. If two x variables are highly cor-
                                                    related, don’t leave them both in the model, or multicolinearity will result. To
                                                    see the correlations between all the x variables, have Minitab calculate a cor-
                                                    relation matrix of all the variables (see the section “Finding and interpreting
                                                    correlations”). You can ignore the correlations between the y variable and the
                                                    x variables and only choose the correlations between the x variables shown
                                                    in the correlation matrix. Find those correlations by intersecting the rows
                                                    and columns of the x variables for which you want correlations.
                                                    If two x variables x 1 and x 2 are strongly correlated (that is their correlation is
                                                    beyond +0.7 or –0.7), then one of them would do just about as good a job of
                                                    estimating y as the other, so you don’t need to include them both in the model.
                                                    Now if x 1 and x 2 aren’t strongly correlated, then both of them working together
                                                    would do a better job of estimating sales than either variable alone. For the ad
                                                    spending example, you have to examine the correlation between the two x
                                                    variables, TV ad spending and newspaper ad spending, to be sure no multi-
                                                    colinearity is present. The correlation between these two variables (as you can
                                                    see in Figure 5-2) is only 0.058. You don’t even need a hypothesis test to tell you
                                                    whether or not these two variables are related; they’re clearly not. However, if
                                                    you want to know, the p-value for the correlation between the spending for the
                                                    two ad types is 0.799 (see Figure 5-2), which is much, much larger than 0.05
                                                    ever thought of being and therefore not statistically significant.
                                                    The large p-value for the correlation between spending for the two ad types
                                                    confirms your thoughts that both variables together may be helpful in esti-
                                                    mating y because each makes its own contribution. It also tells you that
                                                    keeping them both in the model will not create any multicolinearity prob-
                                                    lems. (This completes step four of the multiple regression analysis, as listed
                                                    in the “Stepping through the analysis” section.)
                                         Finding the Best-Fitting Model
                                                    After you have a group of x variables that are all related to y and not related
                                                    to each other (see previous sections), you’re ready to perform step five of the
                                                    multiple regression analysis (as listed in the “Stepping through the analysis”
                                                    section). That is, you’re ready to find the best-fitting model that fits the data.
                                                    In the multiple regression model with two x variables, you have the general
                                                    equation y = b 0 + b 1 x 1 + b 2 x 2 , and you already know which x variables to
                                                    include in the model (by doing step four); the task now is to figure out which
   112   113   114   115   116   117   118   119   120   121   122