Page 377 - Six Sigma Demystified
P. 377
Part 3 S i x S i g m a To o l S 357
There are two considerations for modeling. The first is paucity, which means
that the best model is the one with the fewest number of significant parameters
with the highest R value. Basically, paucity implies doing more with less: The
2
a
best model explains the data with only a few simple terms. The opposite of
paucity is overmodeling: adding lots of terms that help the model explain all the
variation in the data. This sounds great, until you collect more data and discover
that there is a random component that cannot be explained by your earlier
model. When we overmodel, we tend to have a poorer fit when new data are
analyzed. Since there is no predictive value in the model, it is of little practical
purpose.
When considering terms for a statistical model, it is helpful to recall the
infamous words of George Box: “All models are wrong, but some models are
useful.”
The other consideration is that of inheritance: If a factor is removed from the
model because it is insignificant, then all its interactions also should be removed
from the model. Likewise, if the interaction is significant, then all its main fac-
tors should be retained. For example, if the AC interaction is significant, and
factor A was borderline significant (p value near 0.10), it would be best to leave
both A and AC in the model.
When interactions are significant, and one or more of their main factors are
insignificant, then consider whether the interaction may be confounded with
another interaction or main factor or perhaps even a factor not included. Con-
founding means that the factors move together, often because of the way in
which the data were collected. Randomizing the data collection helps to reduce
the instances of confounding.
Stepwise regression is a set of techniques for automating the removal of terms
in the model based on statistical considerations. There are three basic types of
stepwise regression:
• Forward selection. This begins with no parameters, and adds them one at a
time based on a partial F statistic. In this case, factors are not revisited to
see the impact of removing them after other terms have been added.
• Backward elimination. This begins with all parameters, and removes one at
a time based on a partial F statistic. This is basically what we did manually
after adding the interaction terms.
• Stepwise. This is a combination of forward selection and backward elimina-
tion.