Page 375 - Six Sigma Demystified
P. 375
Part 3 S i x S i g m a To o l S 355
Generally, the magnitude of the β coefficients does not provide an indication
of the significance or impact of the factor. In the example equation below, we
cannot say that factor A is more critical than factor C simply because β is
1
larger than β because the scaling (or unit of measure) of each factor may be
3
different. Some software (such as Minitab) will provide the regression function
in coded form, in which case the coefficients are applied to coded values of the
factors (such as –1 and +1), allowing direct comparison to estimate the effects
of the factors.
Once we have constructed a model, there are a number of ways to check the
model, especially through the use of residuals analysis (discussed next) to look
for patterns.
A confidence interval for the regression line may be constructed to indicate
the quality of the fitted regression function. The confidence lines diverge at the
ends and converge in the middle, which may be explained in one of two ways:
1. The regression function for the fitted line requires estimation of two pa-
rameters: slope and y intercept. The error in estimating intercept provides
a gap in the vertical direction. The error in estimating slope can be visual-
ized by imagining the fitted line rotating about its middle. This results in
the hourglass-shaped region shown by the confidence intervals.
2. The center of the data is located near the middle of the fitted line. The
ability to predict the regression function should be better at the center of
the data; hence the confidence limits are narrower at the middle. The abil-
ity to estimate at the extreme conditions is much less, resulting in a wider
band at each end.
Don’t confuse the confidence interval on the line with a prediction interval
for new data. If we assume that the new data are independent of the data used
to calculate the fitted regression line, then a prediction interval for future obser-
vations depends on the error that is built into the regression model plus the
error associated with future data. While our best estimate for the y value based
on a given x value is found by solving the regression equation, we recognize that
there can be variation in the actual y values that will be observed. Thus the
shape of the prediction interval will be similar to that seen in the confidence
interval but wider.
Another useful statistic provided by the ANOVA table is the coefficient of
2
determination (R ), which is the square of the Pearson correlation coefficient
R. R varies between 0 and 1 and indicates the amount of variation in the data
2
accounted for by the regression model. In multiple regression, the Pearson cor-