Page 72 -
P. 72

3.3 Tree Clustering   59

                       The vertical icicle plot  represents the hierarchical clustering  tree and must be
                     inspected bottom-up. Initially, as in  step  I of the merging algorithm, we  have  18
                     clusters of  singleton patterns. Next, patterns B and C are merged together as they
                      are  at  the  smallest  Euclidian  distance.  Consider  now  that  we  are  at  a  linkage
                      distance of  4 with  the following clusters:  u,=(I, D), @={A, B, C, E, F, G, H),
                      @=IS, 9), u4={l, 2, 3, 4, 5, 6, 7). In the next merging phase the distances of  the
                      furthest pair of  patterns (each pattern from a distinct cluster) of  {w,, a), (6-4, y },
                      {u,, u4), {a, @),  {a, w4),  {y, w4) are  computed.  The  smallest  of  these
                      distances corresponds to (a,, u2), therefore these are clustered next. This process
                      continues until finally all patterns are merged into one cluster.




















                      Figure 3.5. (a) Vertical icicle plot for the globular data; (b) Clustering schedule
                      graph.



                        Figure  3.5b  shows  the  clustering  schedule  graph,  which  may  be  of  help  for
                       selecting the best cluster solutions. These usually correspond to a plateau before a
                       high jump  in  the distance measure. In  this case the best  cluster solution has two
                       clusters, corresponding to the somewhat globular clouds  { 1, 2, 3, 4, 5, 6, 7, 8, 9)
                       and  {A, B, C, D, E, F,  G, H, I). Notice also that, usually, the  more meaningful
                       solutions have balanced clusters (in terms of  the number of  cases) or, to put it in
                       another  way,  solutions  with  very  small  or  even  singleton  clusters  are  rather
                       suspicious.
                         Let us now consider the crimes data for the Portuguese towns represented by the
                       scatter plot of  Figure 3.6a. Using the complete linkage method the dendrogram of
                       Figure 3.6b is obtained. A dendrogram is just like a horizontal icicle plot.
                         Dendrogram  inspection  shows  that  there  is  a  substantial  increase  in  the
                       dissimilarity measure when passing from 3 to 2 clusters. It seems, therefore, that an
                       interesting cluster solution from the point of  view of summarizing the data is:

                         Cluster  1 =  (Aveiro, Set6ba1, V.  Castelo): High  incidence of  crimes  against
                       property; above average against persons.
   67   68   69   70   71   72   73   74   75   76   77