Page 165 - Statistics and Data Analysis in Geology
P. 165
Statistics and Data Analysis in Geology - Chapter 6
The Euclidean distance and its square, unfortunately, are expressed as hodge-
podges of the original units of measurement. To be interpretable, they must be
standardized. Comparison with Equation (6.20) suggests that standardization
must involve division by the multivariate equivalent of the variance, which is the
variance-covariance matrix S. Of course, division is not a defined operation in ma-
trix algebra, but we can accomplish the same end by multiplying by the inverse.
Multiplying Equation (6.24) by the inverse of the variance-covariance matrix yields
the standardized squared distance,
D2 = D‘ S-l D (6.25)
This standardized measure of difference between the means of two multivariate
groups is called Mahalanobis’ distance. Substituting quantities from Table 6-5
into Equation (6.25), we obtain
59,098.305 4311.640 -0.010
D2 = [-0*010 -0.0431 [ 4311.640 747.0581 [ -0.0431
= 11.172
Interestingly, we can obtain exactly the same distance measure by substituting
the vector of mean differences into the discriminant function equation itself
- 783.442
D2 = [ -0.010 -0.0431 [ -75.602 1
= 11.172
Mahalanobis’ distance can be visualized on Figure 6-3, where it is equal to the
distance between RA and RB.
The significance of Mahalanobis’ distance can be tested using a multivariate
equivalent of the t-test of the equality of two means, called Hotelling’s T2 test. We
will discuss this test more extensively in the next section. Here, we simply note
that it has the form
T2 = nanb D2 (6.26)
na + nb
and can be transformed to an F-test. The test of multivariate equality, using this
more familiar statistic, is
n, + nb - m - 1
F=( ) ( nanb ) D2 (6.27)
(na + nb - 2) m na + nb
with m and (na + nb - m - 1) degrees of freedom. The null hypothesis tested
by this statistic is that the two multivariate means are equal, or that the distance
between them is zero. That is,
Ho: D=O
against
Hi: D>O
The appropriateness of this as a test of a discriminant function should be
apparent. If the means of the two groups are very close together, it will be difficult to
tell them apart, especially if both groups have large variances. In contrast, if the two
means are well separated and scatter about the means is small, discrimination will
478