Page 102 -
P. 102

4.1 Linear Discriminants   89

                            SB = (m, - m,)(m,  - m2)' , the between-class scatter matrix.   (4-9b)


                            Notice that S,  is directly related, indeed proportional, to the pooled covariance
                          matrix. Both S, and SB are symmetric positive semidefinite matrices.
                            The goal is to choose a direction in the feature space along which the distance of
                          the  means  relative  to  the  within-class  variance  reaches  a  maximum,  thereby
                          maximizing the class  separability. This corresponds to maximizing the following
                          criterion function:






                            The direction x that maximizes J(x) can be shown to be:

                             x=s,'(~, -m2).                                            (4- I Oa)

                            The reader may find the demonstration of this important result in Duda and Hart
                          (1973), where the generalization for c classes, yielding c-1 independent directions,
                           is also explained.
                            For the two class case, an important result of this many-to-one mapping is that it
                           yields  the  same direction  as  the  linear  discriminant  for  equal  covariances  with
                           Mahalanobis metric, expressed by formulas (4-5b) and (4-5c)! This discriminant is
                           proved to be optimal, in a specific sense, for symmetric distributions of the feature
                           vectors (e.g. normal distributions), as will be explained in the next section.
                             Let us compute the Fisher discriminant for the two classes of the cork stoppers
                           data. Using C-' as given in (4-8) and the difference of the means [-24.46  -27.78]',
                           the  Fisher  discriminant is  the  vector computed  in  the  previous  section,  x=[O. 18
                           -0.376]',  corresponding to the solid line of Figure 4.1 1. Projecting the points along
                           this direction we create a new feature, FISHER = 0.18xN-0.376xPRTlO. Using the
                           Mahalanobis threshold for this  new  feature,  the same classification  results as in
                           Figure 4.9 are obtained (see Exercise 4.6).
                             The Fisher linear discriminant can also be obtained through the minimization of
                           the following mean squared error:






                            where ti are the target classification values. This is equivalent to finding the linear
                            regression surface for  the  dataset.  As  a matter of  fact,  it is possible  to estimate
                            posterior  probabilities  of  feature  vector  assignment  to  a  class  and  determine
                            decision functions using regression techniques (see e.g. Cherkassky  and  Mulier,
                            1998). Classification and regression tasks are, therefore, intimately related to each
                            other. In the following chapter on neural networks we will have the opportunity to
                            use the error measure (4- 1 1 ) again.
   97   98   99   100   101   102   103   104   105   106   107