Page 153 - Statistics and Data Analysis in Geology

P. 153

Statistics and Data Analysis in Geology - Chapter 6

of standard deviation, they may be compared directly with each other to determine
the most effective variables.
To compute the matrix of sums of squares and products necessary in the nor-
mal equation set, we found the diagonal entries, Cxi. It is a simple matter to
convert these sums of squares to corrected sums of squares, ssk, and then to
the standard deviations necessary to compute the partial correlation coefficients.
However, it is possible to solve the normal equations in a manner that will yield the
standardized partial regression coefficients directly, and gain an important com-
putational advantage in the process.
The major sources of error in multiple regression occur in the creation of the
entries in the Sn matrix and during the inversion process. The sums of squares of
the variables may become so large that significant digits are lost by truncation. If
the entries in the Sn matrix differ greatly in their magnitudes, an additional loss
of digits may occur during inversion, especially if high correlations exist among the
variables. Some computer programs may be capable of retaining only one or two
significant digits in the coefficients, and with certain data sets retention may even
be worse. Studies have shown that calculations using double-precision arithmetic
may not be sufficient to overcome this problem. However, a few simple modifi-
cations in our computational procedure will gain us two to six significant digits
during computation and greatly increase the accuracy of the computed regression
(Longley, 1967, p. 821-827).
The most obvious step that can be taken is to convert all observations to devia-
tions from the mean. This reduces the absolute magnitude of variables and centers
them about a common mean of zero. As an inevitable consequence, the coefficient
bo will become zero, so the matrix equation can be reduced by one row and one
column. This simple step may gain several significant digits. However, we also
may reduce the size of entries in the matrix still further by converting them all to
correlations. This is equivalent to expressing the original variables in the standard
normal form of zero mean and unit standard deviation. The matrix equation for
regression then has the form

RmB = Rxy
which can be solved by the operation

B=R&R~ (6.8)
where Rw represents the column vector of correlations between y and the xk
independent variables. The m x m matrix of correlations between the xk variables
is represented by RXX. For example, the normal equation for three independent
variables has the form

Note that the equation has one less row and column than the equivalent equation
using the original variables (Eq. 6.5).
Computing the regression equation in standardized form has the disadvantage
that the correlation matrix must be created first, increasing the computational ef-
fort. In order to preserve accuracy, the correlations must be calculated using the

466

148 149 150 151 152 153 154 155 156 157 158