Page 171 - Introduction to Statistical Pattern Recognition
P. 171
4 Parametric Classifiers 153
Linearly separable cases: When there exists a linear classifier to
separate two distributions without error, we call this linearly separable. We
will prove here that C = - I C I never happens in linearly separable cases. This
is done by establishing a contradiction as follows.
For a linearly separable case, there exists a W* for a given U which
satisfies
UTW* > 0. (4.100)
I
Therefore, if C = - C I (or C I occurs at the hh iterative step,
0)
cyuT'w*) = (UC)W* < 0. (4.101)
On the other hand, using (4.91), (4.86). and (4.93), UC can be obtained as
=o. (4.102)
This contradict (4.101), and C = - IC I cannot happen.
Thus, the inequality of (4.99) holds only when IlCll' = 0. That is,
IlC(t)112 continues to decrease monatonically with U, until llC112 equals zero.
4.3 Quadratic Classifier Design
When the distributions of X are normal for both oI and 02, Bayes
the
discriminant function becomes the quadratic equation of (4.1). Even for non-
normal X, the quadratic classifier is a popular one: it works well for many
applications. Conceptually, it is easy to accept that the classification be made
by comparing the normalized distances (X-Mi)TX;' (X-Mi) with a proper
threshold.
However, very little is known about how to design a quadratic classifier,
except for estimating Mi and I;; and inserting these estimates into (4.1). Also,
quadratic classifiers may have a severe disadvantage in that they tend to have
significantly larger biases than linear classifiers particularly when the number