Page 71 -
P. 71

5 8    3 Data Clustering




         3.3  Tree Clustering


         Hierarchical or tree clustering algorithms allow us to reveal the internal similarities
         of  a given  pattern  set  and to structure these similarities hierarchically. They  are
         usually applied to a small set of typical patterns.
           For  n patterns these  algorithms generate a  sequence of  1 to  n clusters. This
         sequence has the form of a binary tree (two branches for each tree node) and can be
         structured bottom up - merging algorithm -, starting with  the individual patterns,
         or  top  down  - splitting  algorithm  -,  starting  with  a  cluster  composed  of  all
         patterns.
           The merging algorithm consists of the following steps:

         1. Given n patterns xi consider initially c = n singleton clusters mi = {  xi  )
         2.  While c 2 1 do
           2.1. Determine the two nearest clusters mi and q using an appropriate similarity
               measure and rule.
           2.2.  Merge  q and  q: uv=(mi, q 1,  therefore  obtaining  a  solution  with  c-1
               clusters.
           2.3. Decrease c.

           Notice that in step 2.1 the determination of the nearest clusters depends both on
         the similarity measure and the rule used to assess the similarity of pairs of clusters.
           Let us consider the Globular data (Cluster.xls) shown in Figure 3.4a and assume
         that the similarity between two clusters is assessed by measuring the similarity of
         its furthest pair of patterns (each one from a distinct cluster). This is the so-called   .
         complete  linkage rule. As the merging process evolves, the similarity measure of
         the  merged  clusters  decreases  or,  putting  it  in  another  way,  their  dissimilarity
         increases.  This is  illustrated in  the binary  tree shown as a vertical icicle plot  in
         Figure 3Sa, for the complete linkage merging process of the 18 patterns.















          a                XI              b                 X1
         Figure 3.4. Globular (a) and filamentary (b) datasets for comparison of clustering
         methods.
   66   67   68   69   70   71   72   73   74   75   76