Page 157 - Statistics and Data Analysis in Geology
P. 157

Statistics and Data Analysis in  Geology - Chapter 6

             of one of  the techniques.  These methods are well described in some of  the texts
             listed in the Selected Readings at the end of  the chapter, especially in Marascuilo
             and Levin (1983) and in Draper and Smith (1998).
                 The backward  elimination procedure consists of  computing a regression in-
             cluding all possible variables and selecting the least significant variable. The selec-
             tion proceeds by examining the standardized partial regression coefficients for the
             smallest value and then recomputing the regression, omitting that variable.  The
             significance of  the deleted variable is tested by the analysis of  variance shown in
             Table  6-3. If  the variable is not making a significant contribution to the regres-
             sion, it is permanently discarded.  The reduced regression model is then fitted to
             the data, a new set of  standardized partial regression coefficients for the reduced
             equation is calculated, and the process is repeated.  At  each step, the regression
             equation is reduced by one variable, until all remaining variables are significant.
                 It is instructive to examine the  collection of  six independent variables mea-
              sured on river basins (file KENTUCKY.TXT) and see if any can be discarded without
              significantly affecting the multiple regression on basin magnitude.  We  can find a
             minimal set of regressions by examining the standardized partial regression coeffi-
              cients, deleting the smallest of  these, and recomputing the regression. Repeatedly
             running a multiple-regression program obviously is less efficient than using a step-
             wise computer program, but it has the advantage that every step in the process can
             be examined closely. When you are confident that you understand the elimination
             process and the changes that occur in the regression coefficients, you may turn to
              a more automated procedure.
                 Although multiple regression is “multivariate” in the sense that more than one
             variable is measured on each observational unit, it really is a univariate technique
             because we are concerned only with the variance of  one variable, y. Behavior of
              the independent variables, the x’s, is not subject to analysis.
                  The next  topic we will  consider is discriminant  function analysis, which in-
             volves identification or the placing of  objects into predefined groups. The discrim-
             ination between two alternative groups is a process that is computationally inter-
             mediate between univariate  procedures  and true multivariate  methods in which
             many variables are considered simultaneously. Two groups, each characterized by
              a set of  multiple variables, can be discriminated by solving a set of  simultaneous
              equations almost identical to those involved in multiple regression. The right-hand
              vector of  the matrix equation, however, does not contain cross products between
              independent variables and a single dependent variable, but rather differences be-
              tween the multivariate means of  the two groups that are to be discriminated.
                  Tests of  discriminant functions involve multivariate extensions of  simple uni-
              variate statistical tests of equality. These will be considered next, followed by a dis-
              cussion of  multivariate classification, or the sorting of  objects into homogeneous
              groups. We  will then consider eigenvector techniques, including principal compo-
              nent and factor analysis.  The final topics will include multivariate  extensions of
              discriminant analysis and multiple regression.
                  This list of topics is certainly not all-inclusive. However, the subjects have been
              chosen because they have found special utility in the Earth sciences. They include a
              wide variety of  computational techniques and encompass many fundamental con-
              cepts.  An understanding of  the theory  and  operational procedures  involved in
              these methods should provide you with a sufficient background to evaluate other
              multivariate techniques as well.

              470
   152   153   154   155   156   157   158   159   160   161   162