Page 73 -
P. 73

60      3 Data Clustering


                                Cluster 2 = {Beja, Braga, Braganqa, Coimbra, Porto,  Santarkm, Viseu): High
                             incidence of crimes against property; below average against persons.
                                Cluster  3  =  {C. Branco,  ~vora, Faro,  Guarda,  Leiria,  Lisboa,  Portalegre,  V.
                             Real): Average incidence of crimes against property and persons.





                                 I                        A".,,.   I











                              Figure  3.6.  Crimes  dataset:  (a)  Scatter  plot;  (b)  Dendrogram  using  complete
                              linkage clustering.



                                The  splitting  version  of  the  hierarchical  clustering  algorithm,  previously
                              mentioned, operates in a top-down fashion, initially considering one cluster formed
                              by  all  patterns.  Afterwards,  the  algorithm  proceeds  to  consider  all  possible
                              partitions (2 in the first step, 3 in the second step, and so on until n in the last step),
                              choosing at each step the partition that minimizes a dissimilarity measure. In this
                              way  a dendrogram  with  inverse structure, starting with  one cluster and  finishing
                              with n clusters, is obtained.
                                As  the  merging  algorithm  is  computationally  easier  and  quicker  than  the
                              splitting algorithm and produces similar solutions, it is the preferred approach.


                              3.3.1  Linkage Rules

                              In the examples presented so far we have made reference to two ways of evaluating
                              the  dissimilarity  between  clusters:  the  within-cluster  average  error  and  the
                              complete  linkage  rule.  Actually,  there  are  several  methods  of  evaluating  the
                              dissimilarity between clusters, the so-called linkage rules. The most used ones are
                              now presented.



                              Single linkage
                              Also  called  NN  (nearest  neighbour)  rule,  the  single  linkage  rule  evaluates  the
                              dissimilarity  between  two  clusters,  d(wi, q), as  the  dissimilarity  of  the  nearest
                              patterns, one from each cluster:
   68   69   70   71   72   73   74   75   76   77   78