Page 191 - Introduction to Statistical Pattern Recognition
P. 191

4  Parametric Classifiers                                     173



                         (1)  We  can  adjust  the  V’s and  vo’s  so  as  to  minimize  E  of  (4.157).
                    Since it  is  difficult  to get  an  explicit mathematical  expression  for  E,  the  error
                    should be  calculated  numerically  each  time  when  we  adjust the  V’s and  vo’s.
                    When  X is  distributed  normally  for  all  classes,  some  simplification  can  be
                                                                                   is
                    achieved, since the  h’s are also normally  distributed  and p (h, I, . . . ,hjL Io;)
                    given by  an  explicit mathematical  expression.  Even for this  case, the  integra-
                    tion of  an (L - l)-dimensional  normal distribution  in the first quadrant  must be
                    carried  out  in  a  numerical  way,  using  techniques  such  as  the  Monte  Carlo
                    method.

                         (2)  Design  a  linear  discriminant  function  between  a  pair  of  classes
                    according  to one of  the  methods  discussed  previously  for two-class  problems.
                    L   .  .
                    (2) discriminant  functions are calculated.  Then, use them  as a piecewise  linear
                    discriminant  function  without  further  modification.  When  each class distribu-
                    tion  is  quite  different  from  the  others, further  modification  can  result  in  less
                    error.  However, in many applications,  the decrease in error is found to be rela-
                    tively minor by the further adjustment of  V’s and vo’s.

                         (3)  We  can  assign  the  desired  output  y(X)  for  a  piecewise  linear
                    discriminant  function  and minimize  the mean-square  error between  the desired
                    and actual outputs in order to find the optimum V’s and vo’s.  The desired out-
                    puts  could be  fixed or  could  be  adjusted  as variables  with constraints.  Unfor-
                    tunately,  even  for piecewise  linearly  separable  data, there  is  no  proof  of  con-
                    vergence.




                    Binary Inputs


                         In  Section  4.1,  we  showed  that  for independent  binary  inputs  the  Bayes
                    classifier  becomes  linear.  In  this  section,  we  will  discuss  other properties  of
                    binary inputs.
                        When we have n binary  inputs forming an input vector X, the number of
                    all  possible  inputs  is  2”, {Xo,. . . ,X21j-I 1  [see Table 4-2 for example].  Then
                    the components of Xi, sk,(k = 1, . . . ,H), satisfy
   186   187   188   189   190   191   192   193   194   195   196