Page 28 -
P. 28

14     1  Basic Notions


      There are two distinct ways such hypotheses can be obtained:
        Supervised,  concept  driven or  indzrctive hypotheses: find  in  the  representation
        space  a  hypothesis  corresponding  to  the  structure  of  the  interpretation  space.
        This is the approach of the previous examples, where given a set of patterns we
        hypothesise  a  solution.  In  order  to  be  useful,  any  hypothesis  found  to
        approximate  the  target  values  in  the  training  set  must  also  approximate
        unobserved patterns in a similar way.
         Unsupervised,  dutu-driven  or  clehictive  hypotheses:  find  a  structure  in  the
        interpretation  space corresponding  to  the  structure  in  the  representation  space.
        The unsupervised  approach attempts to  find a useful  hypothesis  based  only  on
        the similarity relations in the representation space.

        The hypothesis  is  derived  using  learning  methods  which  can  be  of  statistical,
      approximation (error minimization) or structural nature.
        Taking  into  account  how  the  hypothesis  is  derived  and  pattern  similarity  is
      measured, we can establish the hierarchical categorization shown in Figure1 .l 1.
        We proceed to briefly  describe the main characteristics and application scope of
      these approaches, to be explained in detail in  the following chapters.


      1.4.1 Data Clustering

      The objective  of  data clustering  is  to  organize data (patterns)  into  meaningful  or
      useful groups using some type of  similarity measure. Data clustering does not  use
      any prior  class  information.  It  is therefore an  unsupervised classification method,
      in  the  sense  that  the  solutions  arrived  at  are data-driven, i.e., do  not  rely  on  any
      supervisor or teacher.
         Data clustering is useful when one wants to extract some meaning from a pile of
      unclassified information or in  an exploratory phase of pattern  recognition  research
      for assessing  internal  data similarities. I11  section 5.9 we will  also present  a neural
      network  approach  that  relies  on  a well-known  data clustering  algorithm as a first
      processing stage.
         Example of data clustering: Given a table containing crop yields per hectare for
      several soil lots the objective is to cluster these lots into meaningful groups.


      1.4.2 Statistical Classification

      Statistical  classification  is  a  long-established  and  classic  approach  of  pattern
      recognition whose matheniatics dwell on a solid body  of methods and formulas. It
      is  essentially  based  on  the  use  of  probabilistic  models  for  the  feature  vector
      distributions  in  the classes  in  order  to  derive  classifying  functions.  Estimation  of
      these  distributions  is  based  on  a  training  set  of  patterns  whose  classification  is
      known  beforehand  (e.g. assigned  by  human  experts). It  is  therefore  a supervised
      method  of  pattern  recognition,  in  the  sense  that  the  classifier  is  concept-driven,
   23   24   25   26   27   28   29   30   31   32   33