Page 376 - Six Sigma Demystified

P. 376

356 Six SigMa DemystifieD

2
relation coefficient R approaches 1 as the number of factors approaches the
2
number of data values. That is, if we have five factors and eight data values, R
2
may be close to 1 regardless of whether the fit is good or not. R always increases
as factors are added, whether the factors are significant or not. An adjusted
2
value R is calculated for multiple regression models that are corrected based
a
2
on the number of parameters in the model. R always will be less than R and
2
a
provides a better approximation of the amount of variation accounted for by
the model. In the preceding example, the R statistic is calculated as 0.385:
2
a
Approximately 39 percent of the variation in the response is explained by the
regression function. Values near 0.7 or higher generally are considered
acceptable.
A large R value does not imply that the slope of the regression line is steep,
2
that the correct model was used, or that the model will predict future observa-
tions accurately. It simply means that the model happens to account for a large
percent of the variation in this particular set of data.
A t test is performed on each of the model parameters, with a resulting p value
provided. If the p value is less than 5 percent, then it is likely to be significant and
should be retained in the model. (In some cases, such as when we have limited
data from an initial study, we may choose a higher threshold, such as 0.10.) In
this example, it would appear that only factor B is significant; the p values for
factors A and C both greatly exceed even the 0.10 threshold. The R of 0.385
2
a
indicates a relatively poor fit, implying that the model may be missing terms.
A variance inflation factor (VIF) also may be evaluated to determine the
presence of multicollinearity, which occurs when parameters are correlated
with one another. Any parameter with a VIF of between 5 and 10 is suspect;
those exceeding a value of 10 should be removed.

Removing Terms from the Multiple Regression Model

When reviewing the results of the t and VIF tests for the individual factors, we
are considering whether the individual factors provide benefit in estimating the
response. When removing terms from the model, remove only one term at a
time because the error is partially reapportioned among the remaining param-
eters when each parameter is removed.
It is recommended to remove higher-order terms (such as third-, second-, and
then higher-order interactions) first. In fact, we often don’t include higher-order
terms in initial studies so that we can eliminate the factors that are not signifi-
cant using less data. Factors with borderline significance, such as a p value
between 0.05 and 0.10, are best left in the model, particularly at the early stages.

371 372 373 374 375 376 377 378 379 380 381