Page 80 - Statistics II for Dummies
P. 80
64 Part II: Using Different Types of Regression to Make Predictions
Many times in Stats I courses the concept of margin of error is skipped over
after the best-fitting regression line is found, But these are very important
ideas and should always be included. (Okay, enough of the soap box for now.
Let’s get out there and do it!)
Scrutinizing the slope
Recall the slope of the regression line is the amount by which you expect the
y variable to change on average as the x variable increases by 1 unit — the
old rise-over-run idea (see the section “The slope of the regression line”
earlier in this chapter). Now, how do you deal with knowing the best-fitting
line will change with a new data set? You just apply the basic ideas of confidence
intervals and hypothesis tests (see Chapter 3).
A confidence interval for slope
A confidence interval in general has this form: your statistic plus or minus a
margin of error. The margin of error includes a certain number of standard
deviations (or standard errors) from your statistic. How many standard
errors you add and subtract depends on what confidence level, 1 – α, you
want. The size of the standard error depends on the sample size and other
factors.
The equation of the best-fitting simple linear regression line, y = a + bx,
includes a slope (b) and a y-intercept (a). Because these were found using
the data, they’re only estimates of what’s going on in the population, and
therefore they need to be accompanied by a margin of error.
The formula for a 1 – α level confidence interval for the slope of a
regression line is , where the standard error is denoted
. The value of t* comes
from the t-distribution with n – 2 degrees of freedom and area to its right
equal to α ÷ 2. (See Chapter 3 regarding the concept of α.)
In case you wonder why you see n – 2 degrees of freedom here, as opposed
to n – 1 degrees of freedom used in t-tests for the population mean in Stats I,
here’s the scoop. From Stats I you know that a parameter is a number that
describes the population; it’s usually known, and it can change from scenario
to scenario. For each parameter in a model, you lose 1 degree of freedom. The
regression line contains two parameters — the slope and the y-intercept —
and you lose 1 degree of freedom for each one. With the t-test from Stats I you
only have one parameter, the population mean, to worry about, hence you use
n – 1 degrees of freedom.
09_466469-ch04.indd 64 7/24/09 10:20:37 AM