Page 171 - Introduction to Statistical Pattern Recognition

P. 171

4 Parametric Classifiers 153

Linearly separable cases: When there exists a linear classifier to
separate two distributions without error, we call this linearly separable. We
will prove here that C = - I C I never happens in linearly separable cases. This
is done by establishing a contradiction as follows.
For a linearly separable case, there exists a W* for a given U which
satisfies
UTW* > 0. (4.100)

I
Therefore, if C = - C I (or C I occurs at the hh iterative step,
0)
cyuT'w*) = (UC)W* < 0. (4.101)
On the other hand, using (4.91), (4.86). and (4.93), UC can be obtained as

=o. (4.102)

This contradict (4.101), and C = - IC I cannot happen.
Thus, the inequality of (4.99) holds only when IlCll' = 0. That is,
IlC(t)112 continues to decrease monatonically with U, until llC112 equals zero.

4.3 Quadratic Classifier Design

When the distributions of X are normal for both oI and 02, Bayes
the
discriminant function becomes the quadratic equation of (4.1). Even for non-
normal X, the quadratic classifier is a popular one: it works well for many
applications. Conceptually, it is easy to accept that the classification be made
by comparing the normalized distances (X-Mi)TX;' (X-Mi) with a proper
threshold.
However, very little is known about how to design a quadratic classifier,
except for estimating Mi and I;; and inserting these estimates into (4.1). Also,
quadratic classifiers may have a severe disadvantage in that they tend to have
significantly larger biases than linear classifiers particularly when the number

166 167 168 169 170 171 172 173 174 175 176