Page 153 - Statistics and Data Analysis in Geology
P. 153

Statistics and Data Analysis in  Geology - Chapter 6

             of  standard deviation, they may be compared directly with each other to determine
             the most effective variables.
                 To compute the matrix of  sums of  squares and products necessary in the nor-
             mal equation  set, we  found the diagonal entries, Cxi. It  is a simple matter to
             convert  these  sums of  squares  to  corrected  sums of  squares, ssk, and then to
             the standard deviations necessary to compute the partial correlation coefficients.
             However, it is possible to solve the normal equations in a manner that will yield the
             standardized partial regression coefficients directly, and gain an important com-
             putational advantage in the process.
                 The major sources of  error in multiple regression occur in the creation of  the
             entries in the Sn matrix and during the inversion process. The sums of  squares of
             the variables may become so large that significant digits are lost by truncation.  If
             the entries in the Sn matrix differ greatly in their magnitudes, an additional loss
             of  digits may occur during inversion, especially if high correlations exist among the
             variables.  Some computer programs may be capable of  retaining only one or two
             significant digits in the coefficients, and with certain data sets retention may even
             be worse.  Studies have shown that calculations using double-precision arithmetic
             may not be  sufficient to overcome this problem.  However, a few simple modifi-
             cations in our computational procedure will gain us two to six significant digits
             during computation and greatly increase the accuracy of  the computed regression
             (Longley, 1967, p. 821-827).
                 The most obvious step that can be taken is to convert all observations to devia-
             tions from the mean. This reduces the absolute magnitude of variables and centers
             them about a common mean of  zero. As an inevitable consequence, the coefficient
              bo will become zero, so the matrix equation can be reduced by one row and one
              column.  This simple step may gain several significant digits.  However, we  also
             may reduce the size of  entries in the matrix still further by converting them all to
              correlations. This is equivalent to expressing the original variables in the standard
             normal form of  zero mean and unit standard deviation.  The matrix equation for
             regression then has the form

                                             RmB = Rxy
             which can be solved by the operation


                                             B=R&R~                                  (6.8)
             where Rw represents  the column vector  of  correlations between y  and the xk
             independent variables. The m x m matrix of  correlations between the xk variables
             is represented by RXX. For example, the normal equation for three independent
             variables has the form







              Note that the equation has one less row and column than the equivalent equation
              using the original variables (Eq. 6.5).
                  Computing the regression equation in standardized form has the disadvantage
              that the correlation matrix must be created first, increasing the computational ef-
              fort.  In order to preserve accuracy, the correlations must be calculated using the


              466
   148   149   150   151   152   153   154   155   156   157   158