Page 66 -
P. 66

3  Data Clustering










    3.1  Unsupervised Classification


    In  the  previous  chapters,  when  introducing  the  idea  of  similarity  as  a  distance
    between  feature  vectors,  we often  computed  this  distance relative  to  a prototype
    pattern. We have also implicitly assumed that the shape of the class distributions in
     the  feature space  around  a  prototype  was  known,  and  based  on  this,  we  could
     choose  a  suitable  distance  metric.  The  knowledge  of  such  class  shapes  and
     prototypes  is  obtained  from  a  previously  classified  training  set  of  patterns.  The
     design  of  a  PR  system  using  this  "teacher"  information  is  called  a  supervised
     design. For  the moment  our interest is  in  classification  systems and  we  will refer
     then to supervised classification.
       We  are  often  confronted  with  a  more  primitive  situation  where  no  previous
     knowledge  about  the  patterns  is  available  or  obtainable  (after  all,  we  learn  to
     classify a  lot  of  things  without  being  taught).  Therefore  our  classifying  system
     must "discover" the internal similarity structure of the patterns in a useful way. We
     then  need  to  design  our  system  using  a  so-called  unsupervised  approach.  The
     present  chapter  is  dedicated  to the unsupervised classification of  feature vectors,
     also called data clustering. This is essentially a data-driven approach, that attempts
     to discover structure within the data itself, grouping together the feature vectors in
     clusters of data.





















      Figure 3.1. Scatter plot of the first 100 cork stoppers, using features N and PRT.
   61   62   63   64   65   66   67   68   69   70   71