Page 171 - Introduction to Statistical Pattern Recognition
P. 171

4  Parametric Classifiers                                     153



                         Linearly  separable  cases:  When  there  exists  a  linear  classifier  to
                    separate two distributions without error, we  call this  linearly separable.  We
                    will  prove here that C = - I C I never happens in linearly separable cases.  This
                    is done by establishing a contradiction as follows.
                         For  a  linearly  separable case, there  exists  a  W* for  a  given  U  which
                    satisfies
                                               UTW* > 0.                       (4.100)

                                    I
                    Therefore, if C  = - C I (or C I occurs at the hh iterative step,
                                              0)
                                        cyuT'w*) = (UC)W* < 0.                 (4.101)
                    On the other hand, using (4.91), (4.86). and (4.93), UC can be obtained as








                                              =o.                              (4.102)


                    This contradict (4.101), and C = - IC I  cannot happen.
                         Thus,  the  inequality  of  (4.99)  holds  only  when  IlCll'  = 0.  That  is,
                    IlC(t)112 continues to decrease monatonically with U, until  llC112 equals zero.


                    4.3  Quadratic Classifier Design

                         When  the distributions of  X are normal for both  oI and 02, Bayes
                                                                             the
                    discriminant function becomes the quadratic equation of  (4.1).  Even for non-
                    normal  X,  the  quadratic classifier  is  a  popular  one:  it  works  well  for  many
                    applications.  Conceptually, it is easy to accept that the classification be made
                    by  comparing  the  normalized  distances  (X-Mi)TX;'  (X-Mi)  with  a  proper
                    threshold.
                         However, very little is known about how to design a quadratic classifier,
                    except for estimating Mi and I;; and inserting these estimates into (4.1).  Also,
                    quadratic classifiers may  have a severe disadvantage in  that they tend to have
                    significantly larger biases than  linear classifiers particularly when  the number
   166   167   168   169   170   171   172   173   174   175   176