Page 102 - Intermediate Statistics for Dummies
P. 102

09_045206 ch04.qxd  2/1/07  9:49 AM  Page 81
                                                                Chapter 4: Getting in Line with Simple Linear Regression
                                                    middle). The problem again seems to be the residual of –3, which makes the
                                                    histogram be skewed to the left.
                                                    The lower-right plot of Figure 4-4 plots the residuals in the order presented in
                                                    the data set in Table 4-1. Because the data was ordered already, the lower-
                                                    right residual plot looks like the upper-right residual plot in Figure 4-4, except
                                                    the dots are connected. This lower-right residual plot makes the residual
                                                    of –3 stand out even more.
                                                    Checking the spread of the y’s for each x
                                                    The graph in the upper-right corner of Figure 4-4 also addresses the
                                                    homoscedasticity condition. If the condition is met, then the residuals for
                                                    every x-value have about the same spread. If you cut a straight line down
                                                    through each x-value, the residuals have about the same spread (standard
                                                    deviation) each time, except for the last x-value, which again represents
                                                    grade twelve. That means the condition of equal spread in the y-values is met
                                                    for the backpack example.
                                                    If you look at only one residual plot, choose the one in the upper-right corner  81
                                                    of Figure 4-4, the plot of the fitted values (the values of y on the line) versus
                                                    the standardized residuals. Most problems with model fit pop up on that plot
                                                    because a residual is defined as the difference between the observed value
                                                    of y and the fitted value of y. In a perfect world, all the fitted values have no
                                                    residual at all; a large residual (such as the one where the estimated weight is
                                                    20 pounds for twelfth graders; see Figure 4-4) is indicated by a point far off
                                                    from zero. This graph also shows you deviations from the overall pattern of
                                                    the line; for example, if large residuals are on the extremes of this graph (very
                                                    low or very high fitted values), that shows the line isn’t fitting in those areas.
                                                               2
                                                    Using r to measure model fit
                                                    One important way to assess how well the model fits is to measure the value
                                                       2
                                                    of r , where r is the correlation coefficient. Statisticians measure how well a
                                                    model fits by looking at what percentage of the variability in y is explained by
                                                    the model.
                                                    The y-values of the data you collect have a great deal of variability in and of
                                                    themselves. You look for another variable (x) that helps you explain that vari-
                                                    ability in the y-values. After you put that x variable into the model, and you
                                                    find it’s highly correlated with y, you want to find out how well this model did
                                                    at explaining why the values of y are different.
                                                                            2
                                                    As it turns out, the value of r , gives you that measure of model fit. Because
                                                    squaring a number between 0 and +1 makes the result get smaller (except for
                                                                                2
                                                    0 and +1), how do you interpret r ? A value of r = +0.9 or –0.9 is quite high;
                                                                             @Spy
   97   98   99   100   101   102   103   104   105   106   107