Page 169 - Statistics and Data Analysis in Geology
P. 169
Statistics and Data Analysis in Geology - Chapter 6
American statistician who formulated this generalization of Student’s t. When all
operations are complete, we find that the test statistic can be expressed as
T~ = n(E-p)‘S-l (z-p) (6.30)
That is, the arbitrary vector A is equal to the vector of differences between the
means, (X - p). We must find the inverse of the variance-covariance matrix, pre-
multiply this inverse by a row vector of differences, (E - p)’, and then postmultiply
by a column vector of these same differences. The test statistic is a multivariate
extension of the t-statistic, Hotelling’s T2. Critical values of T2 can be determined
by the relation
F= n-rn T2 (6.31)
m(n - 1)
where n is the number of observations and rn is the number of variables, allowing us
to use conventional F-tables rather than special tables of the T2 distribution. More
complete discussions of this and related tests are given in texts on multivariate
statistics such as Overall and Klett (1983), Harris (1985), Krzanowski (1988), and
Morrison (1990).
Although the expression of this test in a form such as Equation (6.30) is easy,
computation of a test value for an actual data set may be very laborious. For ex-
ample, suppose we have measured the content of four elements in seven lunar
samples. We wish to test the hypothesis that these samples have been drawn from
a population having the same mean as terrestrial basalts. Assume we take our
values for the populations’ means from the Handbook of Physical Constants (Clark,
1966, p. 4). Hotelling’s T2 seems appropriate to test the hypothesis that the vector
of lunar sample means is no different than the vector of basalt means given in this
reference.
We must first compute the vector of four sample means and the 4 x 4 matrix of
variances and covariances. The vector of differences between sample and popula-
tion means, (P - p), must also be computed. Next, we must find the inverse of the
variance-covariance matrix, or S-l. We then must perform two matrix multiplica-
tions, (E - p)’S-’(JZ - p), and multiply by n to produce T2. From this description,
you can appreciate that the computational effort becomes increasingly greater as
the number of variables grows larger.
The data for the seven lunar samples are listed in Table 6-6, with the “popu-
lation” means from Clark. Intermediate values in the computation of T2 are also
given, with the final test value of T2 and the equivalent F-statistic, which has m
and (n - m) degrees of freedom. The test statistic of F = 73.11 far exceeds the
critical value of F4,3,0.01 = 28.71, so we conclude that the mean composition of the
sample of lunar basalts is significantly different than the mean composition of the
population of terrestrial basalts.
We have dwelled on the T2 test against a known mean not because this specific
test has greater utility in geology than other multivariate tests, but to illustrate the
close relationship between conventional statistics and multivariate statistics. Mul-
tivariate equivalents can be formulated directly from most univariate tests with the
proper expansion of the basic assumptions. However, the transition from ordinary
algebra to matrix algebra often obscures the underlying similarity between the two
applications. Although we usually regard multivariate methods as an extension of
univariate statistics, univariate, or ordinary, statistical analysis should be consid-
ered as a special subset of the general area of multivariate analysis.
482