Page 377 - Six Sigma Demystified
P. 377

Part 3  S i x   S i g m a  To o l S        357


                             There are two considerations for modeling. The first is paucity, which means
                           that the best model is the one with the fewest number of significant parameters
                           with the highest R  value. Basically, paucity implies doing more with less: The
                                            2
                                            a
                           best model explains the data with only a few simple terms. The opposite of
                           paucity is overmodeling: adding lots of terms that help the model explain all the
                           variation in the data. This sounds great, until you collect more data and discover
                           that there is a random component that cannot be explained by your earlier
                           model. When we overmodel, we tend to have a poorer fit when new data are
                           analyzed. Since there is no predictive value in the model, it is of little practical
                           purpose.
                             When considering terms for a statistical model, it is helpful to recall the
                           infamous words of George Box: “All models are wrong, but some models are
                           useful.”
                             The other consideration is that of inheritance: If a factor is removed from the
                           model because it is insignificant, then all its interactions also should be removed
                           from the model. Likewise, if the interaction is significant, then all its main fac-
                           tors should be retained. For example, if the AC interaction is significant, and
                           factor A was borderline significant (p value near 0.10), it would be best to leave
                           both A and AC in the model.
                             When interactions are significant, and one or more of their main factors are
                           insignificant, then consider whether the interaction may be confounded with
                           another interaction or main factor or perhaps even a factor not included. Con-
                           founding means that the factors move together, often because of the way in
                           which the data were collected. Randomizing the data collection helps to reduce
                           the instances of confounding.
                             Stepwise regression is a set of techniques for automating the removal of terms

                           in the model based on statistical considerations. There are three basic types of
                           stepwise regression:



                             •  Forward selection. This begins with no parameters, and adds them one at a
                                time based on a partial F statistic. In this case, factors are not revisited to
                                see the impact of removing them after other terms have been added.

                             •	 Backward elimination. This begins with all parameters, and removes one at
                                a time based on a partial F statistic. This is basically what we did manually
                                after adding the interaction terms.
                             •  Stepwise. This is a combination of forward selection and backward elimina-
                                tion.
   372   373   374   375   376   377   378   379   380   381   382