Page 376 - Six Sigma Demystified
P. 376

356        Six SigMa  DemystifieD


                                           2
                        relation coefficient R  approaches 1 as the number of factors approaches the
                                                                                              2
                        number of data values. That is, if we have five factors and eight data values, R
                                                                               2
                        may be close to 1 regardless of whether the fit is good or not. R  always increases
                        as factors are added, whether the factors are significant or not. An adjusted
                               2
                        value R  is calculated for multiple regression models that are corrected based
                               a
                                                                                          2
                        on the number of parameters in the model. R  always will be less than R  and
                                                                 2
                                                                 a
                        provides a better approximation of the amount of variation accounted for by
                        the model. In the preceding example, the R  statistic is calculated as 0.385:
                                                                 2
                                                                 a
                        Approximately 39 percent of the variation in the response is explained by the
                        regression  function.  Values  near  0.7  or  higher  generally  are  considered
                        acceptable.
                          A large R  value does not imply that the slope of the regression line is steep,
                                   2
                        that the correct model was used, or that the model will predict future observa-
                        tions accurately. It simply means that the model happens to account for a large
                        percent of the variation in this particular set of data.
                          A t test is performed on each of the model parameters, with a resulting p value
                        provided. If the p value is less than 5 percent, then it is likely to be significant and
                        should be retained in the model. (In some cases, such as when we have limited
                        data from an initial study, we may choose a higher threshold, such as 0.10.) In
                        this example, it would appear that only factor B is significant; the p values for
                        factors A and C both greatly exceed even the 0.10 threshold. The R  of 0.385
                                                                                      2
                                                                                      a
                        indicates a relatively poor fit, implying that the model may be missing terms.
                          A variance inflation factor (VIF) also may be evaluated to determine the
                        presence of multicollinearity, which occurs when parameters are correlated
                        with one another. Any parameter with a VIF of between 5 and 10 is suspect;
                        those exceeding a value of 10 should be removed.

                        Removing Terms from the Multiple Regression Model

                        When reviewing the results of the t and VIF tests for the individual factors, we
                        are considering whether the individual factors provide benefit in estimating the
                        response. When removing terms from the model, remove only one term at a
                        time because the error is partially reapportioned among the remaining param-
                        eters when each parameter is removed.
                          It is recommended to remove higher-order terms (such as third-, second-, and
                        then higher-order interactions) first. In fact, we often don’t include higher-order
                        terms in initial studies so that we can eliminate the factors that are not signifi-
                        cant using less data. Factors with borderline significance, such as a  p value
                        between 0.05 and 0.10, are best left in the model, particularly at the early stages.
   371   372   373   374   375   376   377   378   379   380   381