Page 106 - Statistics II for Dummies
P. 106

90       Part II: Using Different Types of Regression to Make Predictions



                                Minitab can find a correlation matrix between any pairs of variables in the
                                model, including the y variable and all the x variables as well. To calculate a
                                correlation matrix for a group of variables in Minitab, first enter your data in
                                columns (one for each variable). Then go to Stat>Basic Statistics>Descriptive
                                Statistics>Correlation. Highlight the variables from the left-hand side for which
                                you want correlations, and click Select.

                                To find the values of the correlation matrix from the computer output, intersect
                                the row and column variables for which you want to find the correlation, and
                                the top number in that intersection is the correlation of those two variables.
                                For example, the correlation between TV ads and TV sales is 0.791, because
                                it intersects the TV row with the Sales column in the correlation matrix in
                                Figure 5-2.
                                Testing correlations for significance

                                By the rule-of-thumb approach from Stats I (also reviewed in Chapter 4),
                                a correlation that’s close to 1 or –1 (starting around ± 0.75) is strong; a
                                correlation close to 0 is very weak/nonexistent; and around ± 0.6 to 0.7, the
                                relationships become moderately strong. The correlation between TV ads
                                and TV sales of 0.791 indicates a fairly strong positive linear relationship
                                between these two variables, based on the rule-of-thumb. The correlation
                                between newspaper ads and TV sales seen in Figure 5-2 is 0.594, which is
                                moderate by my rule-of-thumb.

                                Many times in statistics a rule-of-thumb approach to interpreting a correlation
                                coefficient is sufficient. However, you’re in the big leagues now, so you need
                                a more precise tool for determining whether or not a correlation coefficient is
                                large enough to be statistically significant. That’s the real test of any statistic:
                                not that the relationship is fairly strong or moderately strong in the sample,
                                but whether or not the relationship can be generalized to the population.

                                Now, that phrase statistically significant should ring a bell. It’s your old friend
                                the hypothesis test calling to you (see Chapter 3 for a brush-up on hypoth-
                                esis testing). Just like a hypothesis test for the mean of a population or the
                                difference in the means of two populations, you also have a test for the cor-
                                relation between two variables within a population.

                                The null hypothesis to test a correlation is Ho: ρ = 0 (no relationship) versus
                                Ha: ρ ≠ 0 (a relationship exists). The letter ρ is the Greek version of r and rep-
                                resents the true correlation of x and y in the entire population; r is the correla-
                                tion coefficient of the sample.

                                  ✓ If you can’t reject Ho based on your data, you can’t conclude that the
                                    correlation between x and y differs from zero, indicating you don’t have
                                    evidence that the two variables are related and x shouldn’t be in the
                                    multiple regression model.










          10_466469-ch05.indd   90                                                                    7/24/09   9:32:33 AM
   101   102   103   104   105   106   107   108   109   110   111