Page 116 - Intermediate Statistics for Dummies
P. 116

10_045206 ch05.qxd  2/1/07  9:49 AM  Page 95
                                                Chapter 5: When Two Variables Are Better than One: Multiple Regression
                                                    The letter ρ is the Greek version of r and represents the true correlation of x
                                                    and y in the entire population; r is the correlation coefficient of the sample.
                                                    Any statistical software package can calculate a hypothesis test of a correla-
                                                    tion for you. The actual formulas used in that process are beyond the scope
                                                    of this book. However the interpretation is the same as for any test: If the p-
                                                    value is smaller than your prespecified value of α (typically 0.05), reject Ho
                                                    and conclude x and y are related. Otherwise you can’t reject Ho, and you con-
                                                    clude you don’t have enough evidence that the variables are related.
                                                    In Minitab, you can conduct a hypothesis test for a correlation by clicking
                                                    on Stat>Basic Statistics>Correlation, and checking the Display p-values box.
                                                    Choose the variables you want to find correlations for, and click Select. You’ll
                                                    get output that is in the form of a little table that shows the correlations
                                                    between the variables for each pair with the respective p-values under each
                                                    one. You can see the correlation output for the ads and sales example in
                                                    Figure 5-2.
                                                    Looking at Figure 5-2, the correlation of 0.791 between TV ads and sales has a  95
                                                    p-value of 0.000, which means it’s actually less than 0.001. That’s a highly sig-
                                                    nificant result, much less than 0.05 (your predetermined α level). So TV ad
                                                    spending is strongly related to sales. The correlation between newspaper ad
                                                    spending and sales was 0.594, which is also found to be statistically signifi-
                                                    cant with a p-value of 0.004.
                                         Checking for Multicolinearity
                                                    You have one more very important step to complete in the relationship-
                                                    exploration process before going on to using the multiple regression model.
                                                    That is, you need to complete step four: looking at the relationship between
                                                    the x variables themselves and checking for redundancy. Failure to do so can
                                                    lead to problems during the model-fitting process.
                                                    Multicolinearity is a term you use if two x variables are highly correlated. Not
                                                    only is it redundant to include both related variables in the multiple regres-
                                                    sion model, but it’s also problematic. The bottom line is this: If two x vari-
                                                    ables are significantly correlated, only include one of them in the regression
                                                    model, not both. If you include both, the computer won’t know what numbers
                                                    to give as coefficients for each of the two variables, because they share their
                                                    contribution to determining the value of y. Multicolinearity can really mess
                                                    up the model-fitting process and give answers that are inconsistent and often-
                                                    times not repeatable in subsequent studies.
   111   112   113   114   115   116   117   118   119   120   121