Page 102 -
P. 102
4.1 Linear Discriminants 89
SB = (m, - m,)(m, - m2)' , the between-class scatter matrix. (4-9b)
Notice that S, is directly related, indeed proportional, to the pooled covariance
matrix. Both S, and SB are symmetric positive semidefinite matrices.
The goal is to choose a direction in the feature space along which the distance of
the means relative to the within-class variance reaches a maximum, thereby
maximizing the class separability. This corresponds to maximizing the following
criterion function:
The direction x that maximizes J(x) can be shown to be:
x=s,'(~, -m2). (4- I Oa)
The reader may find the demonstration of this important result in Duda and Hart
(1973), where the generalization for c classes, yielding c-1 independent directions,
is also explained.
For the two class case, an important result of this many-to-one mapping is that it
yields the same direction as the linear discriminant for equal covariances with
Mahalanobis metric, expressed by formulas (4-5b) and (4-5c)! This discriminant is
proved to be optimal, in a specific sense, for symmetric distributions of the feature
vectors (e.g. normal distributions), as will be explained in the next section.
Let us compute the Fisher discriminant for the two classes of the cork stoppers
data. Using C-' as given in (4-8) and the difference of the means [-24.46 -27.78]',
the Fisher discriminant is the vector computed in the previous section, x=[O. 18
-0.376]', corresponding to the solid line of Figure 4.1 1. Projecting the points along
this direction we create a new feature, FISHER = 0.18xN-0.376xPRTlO. Using the
Mahalanobis threshold for this new feature, the same classification results as in
Figure 4.9 are obtained (see Exercise 4.6).
The Fisher linear discriminant can also be obtained through the minimization of
the following mean squared error:
where ti are the target classification values. This is equivalent to finding the linear
regression surface for the dataset. As a matter of fact, it is possible to estimate
posterior probabilities of feature vector assignment to a class and determine
decision functions using regression techniques (see e.g. Cherkassky and Mulier,
1998). Classification and regression tasks are, therefore, intimately related to each
other. In the following chapter on neural networks we will have the opportunity to
use the error measure (4- 1 1 ) again.