Page 75 -
P. 75
62 3 Data Clusterinn
Complete linkage
Also called FN (furthest neighbour) rule, this method (previously introduced)
evaluates the dissimilarity between two clusters as the greatest distance between
any two patterns, one from each cluster.
~(Lo~,w~)= max /IX-~JJ.
x€wi,y€wj
This rule performs well when the clusters are compact, roughly globular and of
equal size, as shown in the example of the globular data (Figure 3.5~~). It is,
however, inadequate for filamentary clusters. For instance, for the filamentary data
of Figure 3.4b, the 2 and 3 cluster solutions obtained are of globular shape and
therefore inadequate. The 2 cluster solution is: {I, 2, 3, 4, 9, A, B, C] and (5, 6, 7,
8, D, E, F, G, H, 11.
The single and complete linkage rules represent two extremes in dissimilarity
assessment and tend to be sensitive to atypical patterns, since they depend only on
nearest or furthest neighbours, thereby discarding a lot of information. The
following rules use all available information in an averaging approach and are
therefore less sensitive to atypical patterns.
Average linkage between groups
Also known as UPGMA (un-weighted pair-group method using arithmetic
averages), this rule assesses the distance between two clusters as the average of the
distances between all pairs of patterns, each pattern from a distinct cluster:
This method is quite effective for several types of cluster shapes.
Average linkage within groups
This rule, known as UWGM (un-weighted within-group method using arithmetic
averages), is a variant of the previous one, where the distance between two clusters
is taken to be the average of the distances of all possible pairs of patterns, as if they
formed a single cluster:
Notice that this formula is equivalent to formula (3-1) when applied to all
clusters.
This method is quite efficient when we intend to keep the "volume" of the
resulting cluster as small as possible, minimizing the within-cluster average error