Page 160 - Statistics and Data Analysis in Geology
P. 160

Analysis of Multivariate Data

             multivariate means of  the two groups. In matrix notation, we must solve an equa-
             tion of  the form
                                                SA = D                             (6.13)

             where S is an m x m matrix of pooled variances and covariances of the m variables.
             The coefficients of the discriminant equation are represented by a column vector of
             the unknown lambdas. Lowercase lambdas (A) are used by convention to represent
             the coefficients of  the discriminant function.  These are exactly the same as the
             betas (p) used (also by convention) in regression equations.  They should not be
             confused with lambdas used to represent eigenvalues in principal component or
             factor analyses.
                 The right-hand side of  the equation consists of  the column vector of  m differ-
             ences between the means of the two groups, which we will refer to as A and B. You
             will recall from Chapter 3  that such an equation can be solved by inversion and
             multiplication, as
                                               A = S-lD                            (6.14)

             where S-'  is the inverse of  the variance-covariance matrix formed by pooling the
             matrices of the sums of  squares and cross products of the two groups, A and B. To
             compute the discriminant function, we must determine the various entries in the
             matrix equation. The mean differences are found simply by


                                                                                   (6.15)


                 In this notation,  aij is the  ith observation on variable j  in group A  and Zj
             is the mean of  variable j  in group A, which is the arithmetic average of  the na
             observations of variable j  in group A. The same conventions apply to group B. The
             multivariate means of groups A and B can be regarded as forming two vectors. The
             difference between these multivariate means therefore also forms a vector

                                               D=A-B
             or, in expanded form,









                 To construct the matrix of pooled variances and covariances, we must compute
             a matrix of  sums of  squares and cross products of  all variables in group A and a
             similar matrix for group B. For example, considering only group A,





             Here, Uij denotes the ith observation of  variable j  in group A  as before, and d.ik
              denotes the ith Observation of variable k in the same group. Of course, this quantity
             will be the sum of squares of variable k whenever j  = k. Similarly, a matrix of sums
              of squares and cross products can be found for group B:

                                                                                      473
   155   156   157   158   159   160   161   162   163   164   165