Page 156 - Statistics and Data Analysis in Geology
P. 156
Analysis of Multivariate Data
Deviation 1134.12 43 26.38
Total 2934.82 49
Variation
Increasing the number of independent variables in the regression equation will
always increase the SSR (except in the situation where a new variable is perfectly
correlated with a previous variable). However, the increase may not be significant.
The loss of degrees of freedom for deviations may offset the reduction in SSD, and
actually increase the mean squares due to deviation. If this happens, the F-ratio
for the significance of the regression will decrease, and the addition of another
variable has actually detracted from the regression. To determine the very best
possible regression (in the sense of having the most significant F-ratio), all possible
combinations of the variables would have to be examined. This is possible if we
are dealing with few variables, but the number of possible variable combinations
is equal to 2m - 1, and the computational effort is formidable if m is large. Other
procedures are available which yield a nearly optimal regression with much less
effort. These include schemes such as the backward elimination procedure, the
forward selection procedure, stepwise regression, and stagewise regression. These
methods may not find identical regression equations in a large selection of possible
variables, but all will produce approximately equivalent results. A consideration of
each is beyond the scope of this book; we will be content with a brief description
469