Page 138 - Intermediate Statistics for Dummies
P. 138

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 117
                                          Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
                                                    In the next part of the output, you see that at Step 1 the model has the constant
                                                    listed as –22.33. You can also see it includes hang time as the first variable in
                                                    the model. In the section “Exploring scatterplots and correlations,” you can see
                                                    that hang time is one of the more prominent variables, so you may not be sur-
                                                    prised that it shows up in the model selection process right away.
                                                    The p-value of hang time is 0.001, indicating that the variable is significant
                                                    (less than α = 0.05). However, no Step 2 is in this output. That means after hang
                                                    time was included, no other variables made a significant enough contribution
                                                    beyond hang time. The other variables’ p-values were all greater than 0.05.
                                                    The forward selection procedure’s modus operandi is that you have to be
                                                    in the in-crowd in order to be added to the model. The model is like an A-list
                                                    in a way.
                                                    The final model for the punt distance data using the forward selection proce-
                                                    dure with α = 0.05 is y = –22.33 + 43.50x where y = punt distance and x = punt
                                                    hang time. Note that this is a simple linear regression model (Chapter 4 style),  117
                                                    because it has only one x variable in it.
                                                    You can now use this final model to predict punt distance by using hang time.
                                                    Say the hang time is three seconds. That means the punt is expected to go
                                                    y = –22.33 + 43.50  3 = 108.17 feet, or 36.06 yards. (Hang times for punts can
                                                                   *
                                                    range anywhere from 0 seconds if the punt is blocked to around 5.00 seconds
                                                    (see Table 6-1), so don’t put numbers into this equation like 8 seconds. That
                                                    would make for an unbelievable punt distance — seriously!).
                                                    You can find the coefficient of an x variable by looking at the value in the
                                                    output directly across from the name of the variable. Under that value is the
                                                    t-value of this coefficient, and its p-value follows.
                                                    Looking at the fit of the final model
                                                                2
                                                    The value of R adjusted for this model as shown in Figure 6-2 is 64.06 per-
                                                    cent, which may not seem all that great. However, you’re dealing with a
                                                    simple linear regression model, and the value of R in this case is the correla-
                                                    tion coefficient between hang time and distance. This value of R (denoted
                                                    by small r in its own simple regression context) is the square root of 0.6406,
                                                    which is 0.80. This correlation is somewhat strong, actually, so the model fits
                                                    fairly well. Mallow concurs, with a relatively small value of 1.7, as you can see
                                                    on the last line of Figure 6-2.
                                                    A cautionary word about entry level
                                                    So you can have an example where you see more than one variable added to
                                                    a model via forward selection, I conducted a forward selection procedure on
   133   134   135   136   137   138   139   140   141   142   143