Page 117 - Statistics II for Dummies
P. 117

Chapter 5: Multiple Regression with Two X Variables  101


                                As an alternative check for normality apart from using the regular residuals,
                                you can look at the standardized residuals plot (see Figure 5-5) and check out
                                the upper-right plot. It shows how the residuals are distributed across the
                                various estimated (fitted) values of y. Standardized residuals are supposed to
                                follow a standard normal distribution — that is, they should have mean of
                                zero and standard deviation of one. So when you look at the standardized
                                residuals, they should be centered around zero in a way that has no predictable
                                pattern, with the same amount of variability around the horizontal line that
                                crosses at zero as you move from left to right.

                                In looking at the upper-right plot of Figure 5-5, you should also find that most
                                (95 percent) of the standardized residuals fall within two standard deviations
                                of the mean, which in this case is –2 to +2 (via the 68-95-99.7 Rule — remember
                                that from Stats I?). You should see more residuals hovering around zero
                                (where the middle lump would be on a standard normal distribution), and
                                you should have fewer and fewer of the residuals as you go away from zero.
                                The upper-right plot in Figure 5-5 confirms a normal distribution for the ads
                                and sales example on all the counts mentioned here.
                                The lower-left plots of Figures 5-4 and 5-5 show histograms of the regular
                                and standardized residuals, respectively. These histograms should reflect a
                                normal distribution; the shape of the histograms should be approximately
                                symmetric and look like a bell-shaped curve. If the data set is small (as is
                                the case here with only 22 observations), the histogram may not be as close
                                to normal as you would like; in that case, consider it part of the body of
                                evidence that all four residual plots show you. The histograms shown in the
                                lower-left plots of Figure 5-4 and 5-5 aren’t terribly normal looking; however,
                                because you can’t see any glaring problems with the upper-right plots, don’t
                                be worried.

                                Satisfying the second condition: Variance
                                The second condition in checking the multiple regression model is that the
                                residuals have the same variance for each fitted (predicted) value of y. Look
                                again at the upper-right plot of Figure 5-4 (or Figure 5-5). You shouldn’t see
                                any change in the amount of spread (variability) in the residuals around
                                that horizontal line as you move from left to right. Looking at the upper-right
                                graph of Figure 5-4, there’s no reason to say condition number two hasn’t
                                been met.

                                One particular problem that raises a red flag with the second condition is if
                                the residuals fan out, or increase in spread, as you move from left to right
                                on the upper-right plot. This fanning out means that the variability increases
                                more and more for higher and higher predicted values of y, so the condition
                                of equal variability around the fitted line isn’t met, and the regression model
                                wouldn’t fit well in that case.











          10_466469-ch05.indd   101                                                                   7/24/09   9:32:35 AM
   112   113   114   115   116   117   118   119   120   121   122