Page 113 - Statistics II for Dummies
P. 113
Chapter 5: Multiple Regression with Two X Variables 97
Predicting y by Using the x Variables
When you have your multiple regression model, you’re finally ready to
complete step six of the multiple regression analysis: to predict the value of
y given a set of values for the x variables. To make this prediction, you take
those x values for which you want to predict y, plug them into the multiple
regression model, and simplify.
In the ads and plasma TV sales example (see analysis from Figure 5-3), the
best-fitting model is y = 5.26 + 0.162x + 0.249x . In the context of the problem,
1 2
the model is Sales = 5.26 + 0.162 TV ad spending (x ) + 0.249 newspaper ad
1
spending (x ).
2
Remember that the units for plasma TV sales is in millions of dollars and the
units for ad spending for both TV and newspaper ads is in the thousands of
dollars. That is, $20,000 spent on TV ads means x = 20 in the model. Similarly,
1
$10,000 spent on newspaper ads means x = 10 in the model. Forgetting the
2
units involved can lead to serious miscalculations.
Suppose you want to estimate plasma TV sales if you spend $20,000 on TV
ads and $10,000 on newspaper ads. Plug x = 20 and x = 10 into the multiple
1 2
regression model, and you get y = 5.26 + 0.162(20) + 0.249(10) = 10.99. In other
words, if you spend $20,000 on TV advertising and $10,000 in newspaper
advertising, you estimate that sales will be $10.99 million.
This estimate at least makes sense in terms of the data from the 22 store
locations shown in Table 5-1. Location 10 spent $20,000 on TV ads and $5,000
on newspaper ads (short of what you had) and got sales of $9.82 million.
Location 11 spent a little more on TV ads and a lot more on newspaper ads
than what you had and got sales of $16.28 million. Your estimates of sales for
Store Locations 10 and 11 are 5.26 + 0.162 * 20 + 0.249 * 5 = $9.745 million,
and 5.26 + 0.162 * 25 + 0.249 * 25 = $15.535 million, respectively. These estimates
turned out to be pretty close to the actual sales at those two locations ($9.82
million and $16.28 million, respectively, as shown in Table 5-1), giving at least
some confidence that your estimates will be close for the other store loca-
tions not chosen for the study.
Be careful to put in only values for the x variables that fall in the range of
where the data lies. In other words, Table 5-1 shows data for TV ad spending
between $0 and $50,000; newspaper ad spending goes from $0 to $25,000.
It wouldn’t be appropriate to try to estimate sales for spending amounts of
$75,000 for TV ads and $50,000 for newspaper ads, respectively, because the
regression model you came up with only fits the data that you collected. You
have no way of knowing whether that same relationship continues outside
that area. This no-no of estimating y for values of the x variables outside their
range is called extrapolation. As one of my colleagues says, “Friends don’t let
friends extrapolate.”
10_466469-ch05.indd 97 7/24/09 9:32:34 AM