Page 129 - Intermediate Statistics for Dummies
P. 129
11_045206 ch06.qxd 2/1/07 9:52 AM Page 108
108
Part II: Making Predictions by Using Regression
Getting a Kick out of Estimating
Punt Distance
Before you jump into a model selection procedure to predict y by using a set
of x variables, you have to do some legwork. The variable of interest is y, and
that’s a given. But where do the x variables come from? How do you choose
which ones to investigate as being possible candidates for predicting y? And
how do those possible x variables interact with each other toward making
that prediction? All of these questions must be answered before any model
selection procedure can be used. However, this part is the most challenging
and the most fun; a computer can’t think up x variables for you!
Suppose you’re at a football game and the opposing team has to punt the
ball. You see the punter line up and get ready to kick the ball, and a question
comes to you. “Gee, I wonder how far this punt will go? I wonder what factors
influence the distance of a punt? Can I use those factors in a multiple regres-
sion model to try to estimate punt distance? Hmm, I think I’ll consult my
Intermediate Statistics For Dummies book on this and analyze some data
during half-time. . . .” Well, maybe that’s pushing it, but it’s still an interest-
ing question for football players, golfers, soccer players, and even baseball
players. Everyone’s looking for more distance and a way to get it.
In the following sections, you can see how to identify and assess different x
variables in terms of their potential contribution to predicting y.
Brainstorming variables
and collecting data
Starting with a blank slate and trying to think of a set of x variables that may
be related to y may sound like a daunting task, but in reality, this task is prob-
ably not as bad as you think. Most researchers who are interested in predict-
ing y in the first place have some ideas about which variables may be related
to it. After you come up with a set of logical possibilities for x, you collect
data on those variables, as well as y, to see what their actual relationship
with y may be.
The Virginia Polytechnic Institute did a study to try to estimate the distance
of a punt in football (something Ohio State fans aren’t familiar with). Possible
variables they thought may be related to the distance of a punt included the
following: hang time (time in the air, in seconds), right leg strength (measured