Page 168 - Introduction to Statistical Pattern Recognition
P. 168

150                        Introduction to Statistical Pattern Recognition



                           (d)  L(S) is  a  penalty  vector  whose  components  are  functions  of  the
                      corresponding components of r(l).
                           A different approach is to treat the problem of  finding a feasible solution
                      of  (4.73) as a linear programming problem with an artificially created cost vec-
                      tor.  For this approach, it is suggested that the reader refers to a text in  linear
                      programming.
                           A  word  of  caution is in order here.  In  addition to its complexity, all of
                      the above approaches have a more fundamental disadvantage.  For examples of
                      (4.82)  and  (4.83), the  classifier is  designed, based  only  on  the  misclassified
                      samples  in  the  boundary  region.  For  a  good  classifier  the  number  of  the
                      misclassified samples  tends  to  be  small,  and  sometimes  it  is  questionable
                      whether these  samples represent the  true  statistics of  the  boundary  structure.
                      As  the  result, the  resubstitution  error,  using  the  same  sample  set  for  both
                      design and test, tends to be  severely biased toward the optimistic side.  There-
                      fore, it is advisable that independent samples always be  used to test the perfor-
                      mance of the classifier.

                           An iterative process and its convergence:  In order to see how the itera-
                      tive process works, let us  consider the third criterion of  (4.76), in  which y(Zj)
                      are adjusted along with W  under the constraint y(Zj> > 0.  Also, let us assume
                      that our coordinate system has already been transformed to whiten the sample
                      covariance matrix, such that

                                                  UU'  = NI .                     (4.86)
                      Since the result of  the procedure should not depend on  the coordinate system,
                      this  transformation simplifies the  discussion without loss of  generality.  Then
                      the mean-square error becomes




                      The gradients of E* with respect to W and r are

                                             -- - 2(W - -un,                      (4.88)
                                                          1
                                              aw         N

                                                                                  (4.89)


                       In order to satisfy the constraint y(Zi) > 0, a modification is made as follows:
   163   164   165   166   167   168   169   170   171   172   173