Page 160 - Intermediate Statistics for Dummies
P. 160

12_045206 ch07.qxd  2/1/07  9:54 AM  Page 139
                                                  Chapter 7: When Data Throws You a Curve: Using Nonlinear Regression
                                                    Figure 7-5 shows the Minitab output for the quiz-score data example; the
                                                            2
                                                                                                 2
                                                    value of R in this case is 91.7 percent. The value of R tells you what percent-
                                                    age of the variation in the y-values the model can explain. To interpret this
                                                                                 2
                                                    percentage, the closer a value of R is to 100 percent, the better. You can con-
                                                                  2
                                                    sider values of R over 80 percent good. Values under 60 percent aren’t good.
                                                    Those in between I’d consider to be so-so; they could be better. (This assess-
                                                    ment is just my rule of thumb; opinions may vary a bit from one statistician
                                                    to another.)
                                                    However, you can find such a thing in statistics as too many variables spoil-
                                                                          2
                                                    ing the pot. Right beside R on the computer output from any regression
                                                                         2
                                                                                                           2
                                                    analysis is the value of R adjusted, which adjusts the value of R down a
                                                    notch for each variable (and each power of each variable) entered into the
                                                    model. That way, you can’t just throw in a ton of variables into a model
                                                                                                 2
                                                    whose tiny increments all add up to an acceptable R value, without taking a
                                                    hit for throwing everything in the model but the kitchen sink.
                                                                                          2
                                                    To be on the safe side, you can always use R adjusted to assess the fit of  139
                                                                                                    2
                                                                          2
                                                    your model, rather than R . But you should always use R adjusted if you
                                                    have more than one x variable in your model (or more than one power of an
                                                                           2
                                                                                 2
                                                    x variable). The values of R and R adjusted will be close if you have only a
                                                    couple of different variables (or powers) in the model, but as the number of
                                                                                                       2
                                                                                                             2
                                                    variables (or powers) increases, so does the gap between R and R adjusted.
                                                                2
                                                    In that case, R adjusted is the most fair and consistent coefficient to use to
                                                    examine model fit.
                                                    In the quiz-score example (analysis shown in Figure 7-5), the value of R 2
                                                    adjusted is 90.7 percent, still a very high value, meaning the quadratic
                                                                                                       2
                                                                                                             2
                                                    model fits this data very well. (See Chapter 6 for more on R and R adjusted.)
                                                    Checking the residuals
                                                                                                         2
                                                    You’ve looked at the scatterplot of your data and the value of R is high. What’s
                                                    next? Now you want to examine how well the model fits each individual point
                                                    in the model, to make sure you can’t find any spots where the model is way off
                                                    or places where you missed another underlying pattern in the data.
                                                    A residual is the amount of error, or leftover, that occurs when you fit a model
                                                    to a data set. For each observed y-value in the data set, you also have a pre-
                                                    dicted value from the model, typically called y-hat. The residual is the differ-
                                                    ence between value of y and y-hat. Each y-value in the data set has a residual;
                                                    you examine all the residuals together as a group, looking for patterns or
                                                    unusually high values (indicating a big difference between the observed y
                                                    and the predicted y at that point; see Chapter 4 for the full info on residuals
                                                    and their plots).
   155   156   157   158   159   160   161   162   163   164   165