Page 76 -
P. 76
3.3 Tree Clustering 63
of formula (3-1). The solution of the cross data (Figure 3.2b) was obtained with
this rule (available in the SPSS software). Figure 3.9 shows the corresponding
dendrogram.
Ward's method
In Ward's method the sum of the squared within-cluster distances, for the resulting
merged cluster, is computed:
where m is the centroid of the merged clusters.
At each step the two clusters that merge are the ones that contribute to the
smallest increase of the overall sum of the squared within-cluster distances. This
method is reminiscent of the ANOVA statistical test, in the sense that it tries to
minimize the intra-cluster variance and therefore the cluster separability. This
method produces, in general, very good solutions although it tends to create
clusters of smaller size.
Rescaled Distance Cluster Conbine
Figure 3.9. Dendrogram of the +Cross data clustering using the UWGMA rule.
3.3.2 Tree Clustering Experiments
As with any clustering method, when performing tree-clustering experiments it is
important to choose appropriate metrics and linkage rules guided by the inspection
of the scatter diagram of the data. Let us consider, as an illustration, the crimes
data, which is shown in the scatter diagram of Figure 3.6a. Euclidian or squared