Page 143 - Statistics II for Dummies
P. 143

Chapter 7: Getting Ahead of the Learning Curve with Nonlinear Regression  127


                                Examining R  and R  adjusted
                                                   2
                                            2
                                        2
                                Finding R , the coefficient of determination (see Chapter 5 for full details), is
                                                                               2
                                like the day of reckoning for any model. You can find R  on your regression
                                output, listed as “R-Sq” right under the portion of the output where the coef-
                                ficients of the variables appear. Figure 7-8 shows the Minitab output for the
                                                                  2
                                quiz-score data example; the value of R  in this case is 91.7 percent.
                                             2
                                The value of R  tells you what percentage of the variation in the y-values the
                                model can explain. To interpret this percentage, note R  is the square of r, the
                                                                                2
                                correlation coefficient (see Chapter 5). Because values of r beyond ± 0.80 are
                                considered to be good, R  values above 0.64 are considered pretty good also,
                                                      2
                                especially for models with only one x variable.
                                                         2
                                You can consider values of R  over 80 percent good, and values under 60 per-
                                cent aren’t good. Those in between I’d consider so-so; they could be better.
                                (This assessment is just my rule of thumb; opinions may vary a bit from one
                                statistician to another.)
                                However, you can find such a thing in statistics as too many variables spoil-
                                ing the pot. Every time you add another x variable to a regression model, the
                                         2
                                value of R  automatically goes up, whether the variable really helps or not
                                (this is just a mathematical fact). Right beside R  on the computer output
                                                                          2
                                                                       2
                                from any regression analysis is the value of R  adjusted, which adjusts the
                                         2
                                value of R  down a notch for each variable (and each power of each variable)
                                entered into the model. You can’t just throw a ton of variables into a model
                                whose tiny increments all add up to an acceptable R  value without taking a
                                                                             2
                                hit for throwing everything in the model but the kitchen sink.
                                To be on the safe side, you should always use R  adjusted to assess the fit of
                                                                         2
                                your model, rather than R , especially if you have more than one x variable in
                                                       2
                                your model (or more than one power of an x variable). The values of R  and
                                                                                             2
                                 2
                                R  adjusted are close if you have only a couple of different variables (or
                                powers) in the model, but as the number of variables (or powers) increases,
                                                       2
                                                                                   2
                                                             2
                                so does the gap between R  and R  adjusted. In that case, R  adjusted is the
                                most fair and consistent coefficient to use to examine model fit.
                                In the quiz-score example (analysis shown in Figure 7-8), the value of R  adjusted
                                                                                           2
                                is 90.7 percent, which is still a very high value, meaning that the quadratic
                                                                                   2
                                                                                         2
                                model fits this data very well. (See Chapter 6 for more on R  and R  adjusted.)
                                Checking the residuals
                                You’ve looked at the scatterplot of your data and the value of R  is high.
                                                                                       2
                                What’s next? Now you examine how well the model fits each individual point
                                in the data to make sure you can’t find any spots where the model is way off
                                or places where you missed another underlying pattern in the data.








          12_466469-ch07.indd   127                                                                   7/24/09   9:39:10 AM
   138   139   140   141   142   143   144   145   146   147   148