Page 122 - Statistics II for Dummies
P. 122
106
Part II: Using Different Types of Regression to Make Predictions
Other variables you may think of that are related to punt distance may
include the direction and speed of the wind at the time of the punt, the angle
at which the ball was snapped, the average distance of punts made in the
past by a particular punter, whether the game is at home or away in a hostile
environment, and so on. However, these researchers seem to have enough
information on their hands to build a model to estimate punt distance.
For the sake of simplicity, you can assume the kicker is right-footed, which
isn’t always the case, but it represents the overwhelming majority of kickers.
Looking just at this raw data set in Table 6-1, you can’t figure out which vari-
ables, if any, are related to distance of the punt or how those variables may
be related to punt distance. You need more analyses to get a handle on this.
Examining scatterplots and correlations
After you’ve identified a set of possible x variables, the next step is to find out
which of these variables are highly related to y in order to start trimming
down the set of possible candidates for the final model. In the punt distance
example, the goal is to see which of the six variables in Table 6-1 are strongly
related to punt distance. The two ways to look at these relationships are
✓ Scatterplot: A graphical technique
✓ Correlation: A one-number measure of the linear relationship between
two variables
Seeing relationships through scatterplots
To begin examining the relationships between the x variables and y, you use
a series of scatterplots. Figure 6-1 shows all the scatterplots — not only of
each x variable with y but also of each x variable with the other x variables.
The scatterplots are in the form of a matrix, which is a table made of rows
and columns. For example, the first scatterplot in row two of Figure 6-1
looks at the variables of distance (which appears in column one) and hang
time (which appears in row two). This scatterplot shows a possible positive
(uphill) linear relationship between distance and hang time.
Note that Figure 6-1 is essentially a symmetric matrix across the diagonal
line. The scatterplot for distance and hang time is the same as the scatterplot
for hang time and distance; the x and y axes are just switched. The essential
relationship shows up either way. So you only have to look at all the scat-
terplots below the diagonal (where the variable names appear) or above the
diagonal. You don’t need to examine both.
7/23/09 9:27:03 PM
11_466469-ch06.indd 106
11_466469-ch06.indd 106 7/23/09 9:27:03 PM