Page 225 - MATLAB Recipes for Earth Sciences

P. 225

9.3 Cluster Analysis 221

percent_explained =
80.9623
17.1584
0.8805
0.4100
0.2875
0.1868
0.1049
0.0096
0.0000
We see that more than 80% of the total variance is contained in PC , around
1
17% is described by PC , whereas all other PCs do not play any role. This
2
means that most of the variability in the data set can be described by two
new variables only.

9.3 Cluster Analysis

Cluster analysis creates groups of objects that are very similar compared

to other objects or groups. It ﬁrst computes the similarity between all pairs
of objects, then it ranks the groups by their similarity, and ﬁ nally cre-
ates a hierarchical tree visualized as a dendrogram. Examples for group-
ing objects in earth sciences are the correlations within volcanic ashes
(Hermanns et al. 2000) and the comparison of microfossil assemblages
(Birks and Gordon 1985).
There are numerous methods for calculating the similarity between two
data vectors. Let us deﬁne two data sets consisting of multiple measure-

ments on the same object. These data can be described by the vectors:

The most popular measures of similarity of the two sample vectors are

1. Euclidian distance – This is simply the shortest distance between the two
points in the multivariate space.

The Euclidian distance is certainly the most intuitive measure for similar-

220 221 222 223 224 225 226 227 228 229 230