Page 69 -
P. 69

56     3 Data Clustering


              Therefore,  minimizing  E  corresponds  in  a  certain  sense  to  minimizing  the
            cluster volumes, which  seems a sensible decision. If  we use a Euclidian  metric to
            evaluate  the  distance  between  the  pairs  of  feature  vectors,  we  obtain  the  cluster
            solution  depicted  in  Figure  3.2a. This  solution  is  quite  distinct  from  the  visual
            clustering solution, shown  in  Figure 3.2b, which  is obtained  using  the city-block
            metric.  If  we  had  used  the  xCross  data  instead,  it  is  easy  to  see  that  the  only
            adequate metric to use in this case is the Chebychev metric.
              As  we  see  from  the  cross  data  example,  clustering  solutions  may  depend
            drastically  on  the  metric  used.  There  is, however,  a  subtle point  concerning  the
            measurement  scale  of  the  features  that  must  be  emphasized.  Whereas  in  the
            following chapters we will design classifiers using supervised  methods, which  are
            in  principle  independent (or moderately  dependent) of  the  measurement  scale of
            the features, when performing data clustering, the solutions obtained may also vary
            drastically with the measurement scale or type of standardization of the features.








                                       Satybal
                      Faro

                  Real                                               Faro
                                 V~seu
                                                         V  Real
                                                 05
                                               b              RRSONS
            Figure  3.3.  Visual  clustering  of  crimes  data  with  different  coordinate  axes.  (a)
            {Faro, V. Real )  {Setubal, Viseu}; (b) (Viseu, V. Real} {Setubal, Faro) .





              An  example of  the  influence  of feature  standardization  can be  seen  in  Figure
            3.3,  where  the  same  patterns  of  the  Crimes  dataset  are  depicted  changing  the
            position  and  scale  of  the  coordinate  axes.  In  Figure  3.3a  two  clusters  can  be
            identified visually  (Faro, V. Real) and  {Setubal, Viseu); in Figure 3.3b the visual
            clustering  is  different:  (Viseu,  V.  Real ),  {Faro,  Setliball.  Notice  that  this
            contradictory  result  is  a  consequence  of  a  different  representation  scale  of  the
            coordinate axes,  which  corresponds with  situations  of  different true scales: in  a)
            with  a  shrunken persons  scale; in  b)  with  a  shrunken proper&  scale.  If  the true
            scales were used  then  one would  tend  to choose only one cluster. The problem is
            aggravated  when  the  features  are  of  different  nature,  measured  in  different
            measurement units and occupying quite disparate value rages. This scaling aspect
            was already  referred  to in  section 2.3 (see Figure  2.14) when  discussing features'
            contribution  to pattern discrimination.
   64   65   66   67   68   69   70   71   72   73   74