Page 374 - Six Sigma Demystified
P. 374
354 Six SigMa DemystifieD
The ANOVA table uses the F statistic to compare the variability accounted
for by the regression model with the remaining variation owing to error. The
null hypothesis is that the coefficients of the regression model are zero; the
alternative hypothesis is that at least one of the coefficients is nonzero and thus
provides some ability to estimate the response.
Although we could use the F-statistic tables in Appendices 4 and 5 to deter-
mine whether to accept or reject the hypothesis, most statistical software will
provide a p value for the F statistic to indicate the relevance of the model. Most
times we will reject the null hypothesis and assert that the calculated linear
regression model is significant when the p value is less than 0.10. (While a p
value of 0.05 or less is preferred to indicate significance, a value of 0.10 or less
is accepted in preliminary analyses. The assumption is that the parameter will
be retained so that additional analysis can better determine its significance.)
Bear in mind that statistical significance may or may not indicate physical
significance. If we measure the statistical significance of a given factor to a
response, this does not necessarily mean that the factor is in fact significant in
predicting the response in the real world. Factors may happen to vary coinci-
dent with other factors, some of which may be significant. For example, if we
estimate that shoe size is statistically significant in understanding the variation
in height, it does not mean that shoe size is a good predictor of height, nor
should it imply the causal relation that increasing shoe size increases height.
In this example, the calculated p value for the F test is 0.032, so the null
hypothesis that all the coefficients are zero is rejected.
The regression model shown in the Minitab and Excel analyses above is
Response = 45.6 + 0.288(factor A) – 0.380(factor B) + 0.0111(factor C)
The regression model represents our best estimate of future values of y based
on given values of each significant factor. For example, when there are 10 units
of factor A, 1 unit of factor B, and 100 units of factor C, the best estimate for
the response y is
Response = 45.6 + 0.288(10) – 0.380(1) + 0.0111(100) = 49.21
Similarly, we could calculate values of the response y for any value of input
factors x. Recall that extrapolation beyond our data region should be done with
caution.
Each coefficient β indicates the predicted change in y for a unit change in
i
that x when all other terms are constant. For example, β = 0.288 implies that
1
the response increases by 0.288 units for each additional unit of factor A.