Page 91 - Statistics II for Dummies
P. 91

Chapter 4: Getting in Line with Simple Linear Regression  75


                                residual at or beyond +3 or –3. If this occurs, you can consider that point an
                                outlier, which warrants further investigation. (For more on outliers, see the
                                section “Scoping for outliers” later in this chapter.)

                                The residuals should also occur at random — some above the line, and some
                                below the line. If a pattern occurs in the residuals, the line may not be fitting right.

                                The plots in Figure 4-6 seem to have an issue with the very last observation,
                                the one for 12th graders. In this observation, the average student weight
                                (142) seemed to follow the pattern of increasing with each grade level, but
                                the textbook weight (16.06) was less than for 11th graders (20.79) and is the
                                first point to break the pattern.

                                You can also see in the plot in the upper-right corner of Figure 4-6 that the very
                                last data value has a standardized residual that sticks out from the others and
                                has a value of –3 (something that should be a very rare occurrence). So the value
                                you expected for y based on your line was off by a factor of 3 standard devia-
                                tions. And because this residual is negative, what you observed for y was much
                                lower than you may have expected it to be using the regression line.

                                The other residuals seem to fall in line with a normal distribution, as you can
                                see in the upper-right plot of Figure 4-6. The residuals concentrate around
                                zero, with fewer appearing as you move farther away from zero. You can also
                                see this pattern in the upper-left plot of Figure 4-6, which shows how close to
                                normal the residuals are. The line in this graph represents the equal-to-normal
                                line. If the residuals follow close to the line, then normality is okay. If not, you
                                have problems (in a statistical sense, of course). You can see the residual with
                                the highest magnitude is –3, and that number falls outside the line quite a bit.

                                The lower-left plot in Figure 4-6 makes a histogram of the standardized resid-
                                uals, and you can see it doesn’t look much like a bell-shaped distribution. It
                                doesn’t even look symmetric (the same on each side when you cut it down the
                                middle). The problem again seems to be the residual of –3, which skews the
                                histogram to the left.

                                The lower-right plot of Figure 4-6 plots the residuals in the order presented
                                in the data set in Table 4-1. Because the data was ordered already, the lower-
                                right residual plot looks like the upper-right residual plot in Figure 4-6, except
                                the dots are connected. This lower-right residual plot makes the residual of
                                –3 stand out even more.

                                Checking the spread of the y’s for each x
                                The graph in the upper-right corner of Figure 4-6 also addresses the homosce-
                                dasticity condition. If the condition is met, then the residuals for every x-value
                                have about the same spread. If you cut a vertical line down through each x-value,
                                the residuals have about the same spread (standard deviation) each time, except
                                for the last x-value, which again represents grade 12. That means the condition
                                of equal spread in the y-values is met for the textbook-weight example.







          09_466469-ch04.indd   75                                                                   7/24/09   10:20:39 AM
   86   87   88   89   90   91   92   93   94   95   96