Page 122 - Intermediate Statistics for Dummies
P. 122
10_045206 ch05.qxd 2/1/07 9:50 AM Page 101
Chapter 5: When Two Variables Are Better than One: Multiple Regression
Predicting Y by Using the X Variables
By now, you should have your multiple regression model. You’re finally ready
to complete step six of the multiple regression analysis: to predict the value
of y given a set of values for the x variables. To make this prediction, you take
those x values for which you want to predict y, plug them into the multiple
regression model, and simplify.
In the ads and plasma TV sales example (see analysis from Figure 5-3), the
best-fitting model is y = 5.26 + 0.162x 1 + 0.249x 2 . In the context of the problem,
the model is Sales = 5.26 + 0.162 TV ad spending (x 1 ) + 0.249 newspaper ad
spending (x 2 ).
Remember that the units for plasma TV sales is in millions of dollars and the
units for ad spending for both TV and newspaper ads is in the thousands of
dollars. That is, $20,000 spent on TV ads means x 1 = 20 in the model. Similarly,
$10,000 spent on newspaper ads means x 2 = 10 in the model. Forgetting the 101
units can lead to serious miscalculations.
Suppose you want to estimate plasma TV sales if you spend $20,000 on TV
ads and $10,000 on newspaper ads. Plug x 1 = 20 and x 2 = 10 into the multiple
regression model, and you get y = 5.26 + 0.162(20) + 0.249(10) = 10.99. In other
words, if you spend $20,000 on TV advertising and $10,000 in newspaper
advertising, you estimate that sales will be $10.99 million dollars.
This estimate at least makes some sense in terms of the data shown in
Table 5-1. At location ten, they spent $20,000 on TV ads and $5,000 on news-
paper ads (short of what you had) and got sales of $9.82 million. Location
eleven spent a little more on TV ads and a lot more on newspaper ads than
what you had, and got sales of $16.28 million. Your spending amounts fall
between the amounts of locations ten and eleven, and your estimated sales
fall in between theirs also.
Be careful to put in only values for the x variables that fall in the range of
where the data lies. In other words, Table 5-1 shows data for TV ad spending
between $0 and $50,000; newspaper ad spending goes from $0 to $25,000. It
would not be appropriate, say, to try to estimate sales for spending amounts
of $75,000 for TV ads and $50,000 for newspaper ads, respectively. The reason
is that the regression model you came up with only fits the data that you col-
lected; you have no way of knowing whether that same relationship contin-
ues outside that area. This no-no of estimating y for values of the x variables
outside their range is called extrapolation. As one of my colleagues says,
“Friends don’t let friends extrapolate.”