Page 72 -
P. 72
3.3 Tree Clustering 59
The vertical icicle plot represents the hierarchical clustering tree and must be
inspected bottom-up. Initially, as in step I of the merging algorithm, we have 18
clusters of singleton patterns. Next, patterns B and C are merged together as they
are at the smallest Euclidian distance. Consider now that we are at a linkage
distance of 4 with the following clusters: u,=(I, D), @={A, B, C, E, F, G, H),
@=IS, 9), u4={l, 2, 3, 4, 5, 6, 7). In the next merging phase the distances of the
furthest pair of patterns (each pattern from a distinct cluster) of {w,, a), (6-4, y },
{u,, u4), {a, @), {a, w4), {y, w4) are computed. The smallest of these
distances corresponds to (a,, u2), therefore these are clustered next. This process
continues until finally all patterns are merged into one cluster.
Figure 3.5. (a) Vertical icicle plot for the globular data; (b) Clustering schedule
graph.
Figure 3.5b shows the clustering schedule graph, which may be of help for
selecting the best cluster solutions. These usually correspond to a plateau before a
high jump in the distance measure. In this case the best cluster solution has two
clusters, corresponding to the somewhat globular clouds { 1, 2, 3, 4, 5, 6, 7, 8, 9)
and {A, B, C, D, E, F, G, H, I). Notice also that, usually, the more meaningful
solutions have balanced clusters (in terms of the number of cases) or, to put it in
another way, solutions with very small or even singleton clusters are rather
suspicious.
Let us now consider the crimes data for the Portuguese towns represented by the
scatter plot of Figure 3.6a. Using the complete linkage method the dendrogram of
Figure 3.6b is obtained. A dendrogram is just like a horizontal icicle plot.
Dendrogram inspection shows that there is a substantial increase in the
dissimilarity measure when passing from 3 to 2 clusters. It seems, therefore, that an
interesting cluster solution from the point of view of summarizing the data is:
Cluster 1 = (Aveiro, Set6ba1, V. Castelo): High incidence of crimes against
property; above average against persons.