Page 20 - Introduction to Statistical Pattern Recognition
P. 20

2                          Introduction to Statistical Pattern Recognition


                     operation by  observing the  output voltage of  a  microphone over a  period  of
                     time.  This problem reduces to  discrimination of  waveforms from  good  and
                     bad  machines.  On the  other hand,  recognition  of  printed  English  Characters
                     corresponds to classification  of geometric figures.  In order to perform this type
                     of  classification, we  must  first  measure  the  observable characteristics of  the
                     sample.  The most primitive  but  assured way  to  extract all  information con-
                     tained  in  the  sample is  to  measure  the  time-sampled  values  for a  waveform,
                     x(t,), . . . , x(t,,), and the grey levels of pixels for a figure, x(1) , . . . , A-(n), as
                     shown in  Fig.  1-1.  These n  measurements form a vector X.  Even  under  the
                     normal machine condition, the observed waveforms are different each time the
                     observation  is  made.  Therefore,  x(ri)  is  a  random  variable  and  will  be
                     expressed, using boldface, as x(fi).  Likewise, X is called a random vector if  its
                     components are  random  variables and  is  expressed as X.  Similar arguments
                     hold  for  characters: the  observation, x(i), varies  from  one  A  to  another and
                     therefore x(i) is a random variable, and X is a random vector.
                          Thus, each waveform or character is expressed by  a vector (or a sample)
                     in an n-dimensional space, and many waveforms or characters form a distribu-
                     tion  of  X in  the  n-dimensional  space.  Figure  1-2  shows  a  simple  two-
                     dimensional  example  of  two  distributions  corresponding  to  normal  and
                     abnormal machine conditions, where points depict the locations of  samples and
                     solid  lines are  the  contour lines  of  the  probability density  functions.  If  we
                     know these two distributions of X from past experience, we  can set up a boun-
                     dary between  these  two  distributions, g (I- ,, x2) = 0,  which  divides the  two-
                     dimensional space into two  regions.  Once the  boundary  is  selected, we  can
                     classify  a  sample  without  a  class  label  to  a  normal  or  abnormal machine,
                     depending on g (x I, xz)< 0 or g (x, , x2) >O.  We call g (x , x2) a discriminant
                     function, and a network which detects the sign of g (x 1,  x2) is called a pattern
                     I-ecognition network, a categorizer, or a classfier.  Figure  1-3 shows a block
                     diagram of  a  classifier in  a  general n-dimensional space.  Thus,  in  order  to
                     design a classifier, we must study the characteristics of the distribution of X for
                     each category and  find  a proper discriminant function.  This process is  called
                     learning or training, and samples used to design a classifier are called learning
                     or  training samples.  The discussion can be  easily extended to multi-category
                     cases.
                          Thus, pattern recognition, or decision-making in a broader sense, may be
                     considered as a problem of estimating density functions in  a high-dimensional
                     space and dividing the space into the regions of categories or classes.  Because
   15   16   17   18   19   20   21   22   23   24   25