Page 133 - Intermediate Statistics for Dummies
P. 133

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 112
                               112
                                         Part II: Making Predictions by Using Regression
                                                    To get a set of all the correlations between any set of variables in your model
                                                    by using Minitab, go to Stat>Basic Statistics>Correlation. Then highlight all the
                                                    variables you want correlations for and click Select. (To include the p-values
                                                    for each correlation, click the Display p-values box.) Then click OK. You can
                                                    see a listing of all the variables’ names across the top row and down the first
                                                    column. Intersect the row depicting the first variable with the column depicting
                                                    the second variable, and you can find the correlation for that pair.
                                                    Table 6-2 shows the correlations you can calculate between y = punt distance
                                                    and each of the x variables. These results confirm what the scatterplots were
                                                    telling you. Distance seems to be related to all the variables except left leg
                                                    flexibility, because that’s the only variable that didn’t have a statistically sig-
                                                    nificant correlation with distance using the α level 0.05. (For more info on the
                                                    test for correlation, see Chapter 5.)
                                                                        Correlations between Distance of a Punt
                                                      Table 6-2
                                                                                  and Other Variables
                                                      X Variable        Correlation with Punt Distance  P-value
                                                      Hang time         0.819                      0.001*
                                                      Right leg strength  0.791                    0.001*
                                                      Left leg strength  0.744                     0.004*
                                                      Right leg flexibility  0.806                 0.001*
                                                      Left leg flexibility  0.408                  0.167
                                                      Overall leg strength  0.796                  0.001*
                                                      * statistically significant at level α = 0.05
                                                    If you take a look at Figure 6-1, you can see that hang time is related to other
                                                    variables such as right foot and left foot strength, right leg flexibility, and so
                                                    on. This is where things start to get sticky. You have hang time related to dis-
                                                    tance, and lots of other variables related to hang time. While hang time is
                                                    clearly the most related to distance, the final multiple regression model may
                                                    not include hang time. Here’s one possible scenario: You find a combination of
                                                    other x variables that can do a good job estimating y together. And all of those
                                                    other variables are strongly related to hang time. This result might mean that
                                                    in the end you don’t need to include hang time in the model. Strange things
                                                    happen when you have many different x variables to choose from.
                                                    After you narrow down the set of possible x variables for inclusion in the
                                                    model to predict punt distance, the next step is to put those variables through
                                                    a selection procedure of some sort, which trims down the list to a set of essen-
                                                    tial variables for predicting y. The next sections show various techniques for
                                                    going through this model selection process.
   128   129   130   131   132   133   134   135   136   137   138