Page 138 - Intermediate Statistics for Dummies
P. 138
11_045206 ch06.qxd 2/1/07 9:52 AM Page 117
Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
In the next part of the output, you see that at Step 1 the model has the constant
listed as –22.33. You can also see it includes hang time as the first variable in
the model. In the section “Exploring scatterplots and correlations,” you can see
that hang time is one of the more prominent variables, so you may not be sur-
prised that it shows up in the model selection process right away.
The p-value of hang time is 0.001, indicating that the variable is significant
(less than α = 0.05). However, no Step 2 is in this output. That means after hang
time was included, no other variables made a significant enough contribution
beyond hang time. The other variables’ p-values were all greater than 0.05.
The forward selection procedure’s modus operandi is that you have to be
in the in-crowd in order to be added to the model. The model is like an A-list
in a way.
The final model for the punt distance data using the forward selection proce-
dure with α = 0.05 is y = –22.33 + 43.50x where y = punt distance and x = punt
hang time. Note that this is a simple linear regression model (Chapter 4 style), 117
because it has only one x variable in it.
You can now use this final model to predict punt distance by using hang time.
Say the hang time is three seconds. That means the punt is expected to go
y = –22.33 + 43.50 3 = 108.17 feet, or 36.06 yards. (Hang times for punts can
*
range anywhere from 0 seconds if the punt is blocked to around 5.00 seconds
(see Table 6-1), so don’t put numbers into this equation like 8 seconds. That
would make for an unbelievable punt distance — seriously!).
You can find the coefficient of an x variable by looking at the value in the
output directly across from the name of the variable. Under that value is the
t-value of this coefficient, and its p-value follows.
Looking at the fit of the final model
2
The value of R adjusted for this model as shown in Figure 6-2 is 64.06 per-
cent, which may not seem all that great. However, you’re dealing with a
simple linear regression model, and the value of R in this case is the correla-
tion coefficient between hang time and distance. This value of R (denoted
by small r in its own simple regression context) is the square root of 0.6406,
which is 0.80. This correlation is somewhat strong, actually, so the model fits
fairly well. Mallow concurs, with a relatively small value of 1.7, as you can see
on the last line of Figure 6-2.
A cautionary word about entry level
So you can have an example where you see more than one variable added to
a model via forward selection, I conducted a forward selection procedure on