Page 56 -
P. 56

42     2 Pattern Discrimination


          in  many  textbooks  on  Statistics,  namely  Hoe1  (1975).  and  Siegel  and  Castellan
          (1998).


          2.5.1 Graphic Inspection

          Graphic  inspection  allows  one  to  compare  feature  distributions  for  the  several
          classes  of  the  problem  at  hand,  and  therefore  to  obtain  some  insight  into  their
          usefulness  for  the  class  discrimination  task.  There  are  several  types  of  graphic
          representations  that  can  be  used  depending  on  the  insight  one wishes  to  obtain.
          Particularly  useful  in  PR  problems,  besides  histogra~ns, are box  plots  and  scatter
          plots.




              ,800

              two
              4000
            E
               800
               200

              ,200
            a
          Figure 2.18. Box plots  for the 3 classes of cork stopper& and features PRT (a) and
          PRTG  (b):   - median position; I - exfremes,  exceeding above or below  the box
          1.5 times the interquartile distance; o - oufliers, exceeding above or below the box
          3 times the interquartile d~stance.




            A  box  plot  depicts  for  each  feature  and  each  class  a  box  representing  the
          interquartile  range  of  the  distribution,  i.e., covering  50%  of  the  central  feature
          values.  Figure  2.18  shows  box  plots  for  features  PRT  and  PRTG  of  the  cork
          stoppers  data.  As  shown  in  this  figure,  the  box  plot  usually  includes  further
          infonnation.
            As can be appreciated from Figure 2.18, box plots give a clear indication of  the
          discrimination capability  of  each feature and the amount of  overlap of  the  several
          classes. In  this example we  see that  feature PRT seems to discriminate  the  three
          classes  well,  meanwhile  feature  PRTG  seems  to  be  useful  only  in  the
          discrimination of class y from classes wl and  Q.
            Features  with  largely  overlapped distributions are, of  course, of  little help  for
          classification or regression  purposes.
   51   52   53   54   55   56   57   58   59   60   61