Page 61 - Intermediate Statistics for Dummies
P. 61
06_045206 ch02.qxd 2/1/07 9:42 AM Page 40
40
Part I: Data Analysis and Model-Building Basics
the closer the correlation is to –1.0 or +1.0. A positive correlation means that
as x increases on the x-axis, y also increases on the y-axis. Statisticians call
this type of relationship an uphill relationship. A negative correlation means
that as x increases on the x-axis, y goes down. Statisticians call this type of
relationship — you guessed it — a downhill relationship.
For the golf data set, the correlation is 0.896 = 0.90, which is extremely high
as correlations go. This strong correlation (close to +1.0) is a good thing
because it means number of putts can do a great job of predicting total score.
Because the sign of the correlation is positive, it means as you increase
number of putts, your total score increases (an uphill relationship). For
instructions on calculating a correlation in Minitab, see Chapter 4.
Making predictions
If you want to predict some response variable (y) using one explanatory vari-
able (x), and you want to use a straight line to do it, you can use simple linear
regression (see Chapter 4 for all the fine points on this topic). Linear regres-
sion finds the best-fitting line that cuts through the data set, called the regres-
sion line. After you get the regression line, you can plug in a value of x and
get your prediction for y. (For instructions on using Minitab to find the best-
fitting line for your data, see Chapter 4.)
To use the golf example from the previous section, suppose you want to pre-
dict the total score you can get for a certain number of putts. In this case, you
want to calculate the linear regression line. By using the data set shown in
Table 2-2, and running a regression analysis, the computer tells you that the
best line to use to predict total score using number of putts is the following:
Total score = 39.6 + 1.52 Number of putts
*
So if you have 35 putts in an 18-hole golf course, your total score is predicted
to be about 39.6 + 1.52 * 35 = 92.8, or 93. (Not bad for 18 holes!)
Notice that the slope of the regression line tells you what you really want to
know — how much does your total score increase with every additional putt?
In other words, how much damage is done when you miss the hole on your
first, or second, or third putt? The slope of the regression line for the golf
data set is 1.52. Because the slope of a line is the ratio of the change in y
(total score) to the change in x (number of putts) this means that every addi-
tional putt you need results in an overall increase in total score by 1.52.
Maybe that’s why Tiger Woods spends so much time on his short game.