Page 71 -
P. 71
5 8 3 Data Clustering
3.3 Tree Clustering
Hierarchical or tree clustering algorithms allow us to reveal the internal similarities
of a given pattern set and to structure these similarities hierarchically. They are
usually applied to a small set of typical patterns.
For n patterns these algorithms generate a sequence of 1 to n clusters. This
sequence has the form of a binary tree (two branches for each tree node) and can be
structured bottom up - merging algorithm -, starting with the individual patterns,
or top down - splitting algorithm -, starting with a cluster composed of all
patterns.
The merging algorithm consists of the following steps:
1. Given n patterns xi consider initially c = n singleton clusters mi = { xi )
2. While c 2 1 do
2.1. Determine the two nearest clusters mi and q using an appropriate similarity
measure and rule.
2.2. Merge q and q: uv=(mi, q 1, therefore obtaining a solution with c-1
clusters.
2.3. Decrease c.
Notice that in step 2.1 the determination of the nearest clusters depends both on
the similarity measure and the rule used to assess the similarity of pairs of clusters.
Let us consider the Globular data (Cluster.xls) shown in Figure 3.4a and assume
that the similarity between two clusters is assessed by measuring the similarity of
its furthest pair of patterns (each one from a distinct cluster). This is the so-called .
complete linkage rule. As the merging process evolves, the similarity measure of
the merged clusters decreases or, putting it in another way, their dissimilarity
increases. This is illustrated in the binary tree shown as a vertical icicle plot in
Figure 3Sa, for the complete linkage merging process of the 18 patterns.
a XI b X1
Figure 3.4. Globular (a) and filamentary (b) datasets for comparison of clustering
methods.