Page 140 -
P. 140

4.5 Classifier Evaluation   127

                         Influence of finite design set

                         The  estimate  of  the  design  set  error  will  depend  on  the  particular  sample
                         distributions  in  both  classes.  For  normal  distributions,  the  design  set  error  is
                         influenced by the deviation of the sample means and covariances, computed with n
                         design samples, from the true values, resulting in:
                                          v
                            E [ped (n)] z Pe + - ;                                    (4-46a)
                                          n




                           Therefore, the variance is zero, but there is a bias  v / n , where u is constant for
                         the  same classifier and  n  is  the  number  of  design  samples used.  For  the  linear
                          normal  classifier the bias is  approximately proportional to dln. For the quadratic
                          normal classifier the bias is approximately proportional to d2/n, therefore it grows
                          quite fast with d. This makes the quadratic classifier more sensitive to parameter
                          estimation errors than the linear one.

                            When influences from both the finite design set and the finite test set are taken
                          into account, it is verified that the bias is only influenced by the design set as stated
                          in (4-46a), and the variance is given by:

                                                           Pe2 (4 - Pe2 (4 + v[~e, (n)] .  (4-47)
                                                                 n2

                            The last term on the right hand side is nearly zero for the linear classifier. The
                          variance is thus dominated by the first two terms. These are influenced by  the bias
                          of  the  design  set.  However,  this  influence  is  minimal  and  can  be  neglected.
                          Briefly:

                          - The bias is predominantly influenced by the finiteness of the design set;
                          - The variance is predominantly influenced by the finiteness of the test set.

                            In normal practice we only have a pattern  set X with n samples available. The
                          problem arises of how to divide the available patterns into design set and test set.
                          The following alternatives are possible:

                          Resubstitution method

                           The  whole  set  X is  used  for  design,  and  also  for  testing  the  classifier.  As  a
                           consequence of the non-independence of design and test sets, the method yields, on
                           average,  an  optimistic  estimate  of  the  error,  corresponding  to  the  estimate
                           E[ ped(n)] mentioned  in  section 4.2.4. For the two-class linear discriminant with
   135   136   137   138   139   140   141   142   143   144   145