Page 100 - Intermediate Statistics for Dummies
P. 100

09_045206 ch04.qxd  2/1/07  9:49 AM  Page 79
                                                                Chapter 4: Getting in Line with Simple Linear Regression
                                                    other words, you can discover that looking at errors helps you assess the fit of
                                                    the model and diagnose problems that caused a bad fit, if that was the case.
                                                    Finding the residuals
                                                    A residual is the difference between the observed value of y (from the best-
                                                    fitting line) and the predicted value of y (from the data set). Specifically, for
                                                    any data point, you take its observed y-value (from the data) and subtract the
                                                    expected y-value (from the line). If the residual is large, the line doesn’t fit
                                                    well in that spot. If the residual is small, the line fits well in that spot.
                                                    For example, suppose you have a point in your data set (2, 4) and the equa-
                                                    tion of the best-fitting line is y = 2x +1. The expected value of y in this case
                                                    is 2  2 + 1 = 5. The observed value of y from the data set is 4. Taking the
                                                       *
                                                    observed value minus the estimated value you get 4 – 5 = –1. The residual for
                                                    that particular data point (2, 4) is –1. If you observe a y-value of 6 and use the
                                                    same straight line to estimate y, then the residual would be 6 – 5 = +1.
                                                    In general, a positive residual means you underestimated y at that point, and  79
                                                    a negative residual means you overestimated y at that point.
                                                    Standardizing the residuals
                                                    To make interpreting the residuals easier, statisticians typically standardize
                                                    them; that is, subtract the mean of the residuals (zero) and divide by the stan-
                                                    dard deviation of all the residuals. The residuals are a data set just like any
                                                    other data set, so you can find their mean and standard deviation like you
                                                    always do. Standardizing just means converting to a Z-score, so you see where
                                                    it falls on the standard normal distribution.
                                                    Making residual plots
                                                    You can plot the residuals on a graph called a residual plot. (If you’ve stan-
                                                    dardized the residuals, you call it a standardized residual plot.) Figure 4-4
                                                    shows Minitab output for a variety of standardized residual plots, all getting
                                                    at the same idea: checking to be sure the conditions of the simple linear
                                                    regression model are met.
                                                    Checking normality
                                                    If the condition of normality is met, you can see on the residual plot lots of
                                                    (standardized) residuals close to zero; as you move farther and farther away
                                                    from zero, you can see fewer and fewer residuals. Note: A standardized resid-
                                                    ual at or beyond +3 or –3 is something you shouldn’t expect to see. If this
                                                    occurs, you can consider that point an outlier, which warrants further investi-
                                                    gation. (For more on outliers, see the section “Scoping for outliers.”)
                                                    The residuals should also occur at random — some above the line, some below
                                                    the line. If a pattern occurs in the residuals, the line may not be fitting right.



                                                                             @Spy
   95   96   97   98   99   100   101   102   103   104   105