Page 108 - Statistics II for Dummies
P. 108

92       Part II: Using Different Types of Regression to Make Predictions



                                To head off the problem of multicolinearity, along with the correlations you
                                examine regarding each x variable and the response variable y, also find the
                                correlations between all pairs of x variables. If two x variables are highly
                                correlated, don’t leave them both in the model, or multicolinearity will result.
                                To see the correlations between all the x variables, have Minitab calculate
                                a correlation matrix of all the variables (see the section “Finding and
                                interpreting correlations”). You can ignore the correlations between the
                                y variable and the x variables and only choose the correlations between
                                the x variables shown in the correlation matrix. Find those correlations by
                                intersecting the rows and columns of the x variables for which you want
                                correlations.

                                If two x variables x  and x  are strongly correlated (that is, their correlation
                                                1     2
                                is beyond +0.7 or –0.7), then one of them would do just about as good a job
                                of estimating y as the other, so you don’t need to include them both in the
                                model. If x  and x  aren’t strongly correlated, then both of them working
                                         1     2
                                together would do a better job of estimating sales than either variable alone.
                                For the ad-spending example, you have to examine the correlation between
                                the two x variables, TV ad spending and newspaper ad spending, to be sure no
                                multicolinearity is present. The correlation between these two variables (as
                                you can see in Figure 5-2) is only 0.058. You don’t even need a hypothesis test
                                to tell you whether or not these two variables are related; they’re clearly not.

                                The p-value for the correlation between the spending for the two ad types is
                                0.799 (see Figure 5-2), which is much, much larger than 0.05 ever thought of
                                being and therefore isn’t statistically significant. The large p-value for the
                                correlation between spending for the two ad types confirms your thoughts
                                that both variables together may be helpful in estimating y because each
                                makes its own contribution. It also tells you that keeping them both in the
                                model won’t create any multicolinearity problems. (This completes step four
                                of the multiple regression analysis, as listed in the “Stepping through the
                                analysis” section.)



                      Finding the Best-Fitting Model

                      for Two x Variables


                                After you have a group of x variables that are all related to y and not related
                                to each other (refer to previous sections), you’re ready to perform step five
                                of the multiple regression analysis (as listed in the “Stepping through the
                                analysis” section). You’re ready to find the best-fitting model for the data.

                                In the multiple regression model with two x variables, you have the general
                                equation y = b  + b x  + b x , and you already know which x variables to
                                            0   1 1  2 2
                                include in the model (by doing step four in the previous section); the task
                                now is to figure out which coefficients (numbers) to put in for b , b , and b ,
                                                                                       0  1     2





          10_466469-ch05.indd   92                                                                    7/24/09   9:32:34 AM
   103   104   105   106   107   108   109   110   111   112   113