Page 228 - MATLAB Recipes for Earth Sciences

P. 228

224 9 Multivariate Statistics

data = load('sediments.txt');
for i=1:10
sample(i,:) = ['sample',sprintf('%02.0f',i)];
end
clear i
minerals= ['amp';'pyr';'pla';'ksp';'qtz';'cla';'flu';'sph';'gal'];

Subsequently, the distances between pairs of samples can be computed. The
function pdist provides many ways for computing this distance, such as
the Euclidian or Manhattan distance. We use the default setting which is the
Euclidian distance.

Y = pdist(data);
The function pdist returns a vector Y containing the distances between
each pair of observations in the original data matrix. We can visualize the
distances on another pseudocolor plot.
squareform(Y);
imagesc(squareform(Y)),colormap(hot)
title('Euclidean distance between pairs of samples')
xlabel('First Sample No.')
ylabel('Second Sample No.')
colorbar

The function squareform converts Y into a symmetric, square format, so
that the elements (i,j)of the matrix denote the distance between the i
and j objects in the original data. Next we rank and link the samples with
respect to their inverse distance using the function linkage.
Z = linkage(Y);

In this 3-column array Z, each row identiﬁes a link. The ﬁrst two columns

identify the objects (or samples) that have been linked, the third column
contains the individual distance between these two objects. The ﬁ rst row
(link) between objects (or samples) 1 and 2 has the smallest distance cor-
responding to the highest similarity. Finally, we visualize the hierarchical
clusters as a dendrogram which is shown in Figure 9.4.

dendrogram(Z);
xlabel('Sample No.')
ylabel('Distance')
box on
Clustering ﬁnds the same groups as the principal component analysis. We

observe clear groups consisting of samples 1, 2, 8 to 10 (the magmatic

223 224 225 226 227 228 229 230 231 232 233