Page 159 - Introduction to Statistical Pattern Recognition
P. 159

4 Parametric Classifiers                                      141



                        Since Procedures I1 and I11 produce different s’s, V’s,  v,’s,  and  E’S,  we
                    need  to know which s,  V, v,, and E to use.  Once a classifier has been designed
                    by  using N samples and  implemented, the  classifier is  supposed  to  classify
                    samples which were never used in design.  Therefore, the error of Procedure I11
                    is the one to  indicate the performance of  the classifier in  operation.  However,
                    the  error  of  Procedure  111  alone  does  not  tell  how  much  the  error  can  be
                    reduced if  we  use a larger number of  design samples.  The error of  the  ideal
                    classifier, which  is  designed  with  an  infinite number of  design samples, lies
                    somewhere between the errors of Procedures I1 and 111.  Therefore, in order to
                    predict the  asymptotic error experimentally, it is common practice to run both
                    Procedures I1 and 111.  As far as the parameter selection of  the classifier is con-
                    cerned,  we  can  get  better  estimates  of  these  parameters by  using  a  larger
                    number of design samples.  Therefore, if  the available sample size is fixed, we
                    had  better  use  all  samples to  design  the  classifier.  Thus,  the  s,  V,  and  v,
                    obtained by  Procedure I1 are the ones which must be used in classifier design.

                         Before leaving this  subject, the  reader should be  reminded that  the  cri-
                    teria  discussed in  this  section can be  used  to  evaluate the  performance of  a
                    linear classifier regardless of  whether the classifier is optimum or not.  For  a
                    given  linear classifier and  given  test  distributions, yi and  of  are  computed
                    from (4.19) and (4.20), and they are inserted into a chosen criterion to evaluate
                    its performance.  When  the distributions of  X  are normal for both o1 and  ~  2  ,
                    h (X) becomes normal.  Thus, we can use the error of (4.38).

                    Optimum Design of a Nonlinear Classifier

                         So far, we  have  limited our discussion  to  a  linear classifier.  However,
                    we can extend the previous discussion to a more general nonlinear classifier.


                         General nonlinear classifier: Let y(X) be  a general discriminant func-
                    tion with X classified according to


                                                                               (4.47)

                    Also, let f(y,,q2,s:,s3)  be. the criterion to be optimized with respect to y(X),
                    where
   154   155   156   157   158   159   160   161   162   163   164