Page 123 - Intermediate Statistics for Dummies
P. 123
10_045206 ch05.qxd 2/1/07 9:50 AM Page 102
102
Part II: Making Predictions by Using Regression
Checking the Fit of the Model
Before you run to your boss in triumph saying you’ve slam-dunked the ques-
tion of how to estimate plasma TV sales, you first have to make sure all your
i’s are dotted and all your t’s are crossed, as you do with any other statistical
procedure. In this case, you have to check the conditions of the multiple
regression model. These conditions mainly focus on the residuals (the differ-
ence between the estimated values for y and the observed values of y from
your data). If the model is close to the actual data you collected, you can
feel somewhat confident that if you collected more data, it would fall in line
with the model as well, and your predictions shouldn’t be too bad.
In this section, you see what the conditions are for multiple regression, and
specific techniques statisticians use to check each of those conditions. The
main character in all of this condition checking is the residual.
Noting the conditions
The conditions for multiple regression concentrate on the error terms, or resid-
uals. The residuals are the amount that’s left over after the model has been fit.
They represent the difference between the actual value of y observed in the
data set and the estimated value of y based on the model. The conditions of
the multiple regression model are the following (note that all need to be met in
order to give the go-ahead for a multiple regression model):
The residuals have a normal distribution with mean zero.
The residuals have the same variance for each fitted (predicted)
value of y.
The residuals are independent (don’t affect each other).
Plotting a plan to check the conditions
It may sound like you have a ton of things to check here and there, but luck-
ily, Minitab gives you all the info you need to know in a series of four graphs,
all presented at one time. These plots are called the residual plots, and they
graph the residuals against the values of a normal distribution to see whether
the normality condition fits.