Page 120 -
P. 120

4.2 Bavesian Classification   107






                            Formula (4-30) allows the computation of confidence interval estimates for  Pe ,
                          by  substituting  Pe in place of Pe and using the normal  distribution approximation
                          for sufficiently large n (say, n 2 25). Notice that they are zero for the extreme cases
                          of  Pe=O  or Pe=l. Furthermore,  as this  formula  is  independent  of  the  classifier
                          model,  its  value  is  to  be  considered  a  worst-case  value,  yielding  in  many
                          circumstances unrealistically  large intervals.
                            In normal practice we compute  Fed by designing and evaluating the classifier in
                          the same set with n patterns,  Fed (n). This error estimate is related to an empirical
                          risk, mentioned  already  in section 4.2.1.  As for  Pet, we may compute it using  an
                          independent set of n patterns,  Pet (n). In order to have some guidance on how to
                          choose an appropriate dimensionality ratio, we would like to know the deviation of
                          the expected values of these estimates from the Bayes error, where the expectation
                          is computed on a population of classifiers of the same type and trained in the same
                          conditions. Formulas for these expectations, E[Ped (n)] and E[Fe, (n)], are quite
                          intricate and can only be computed numerically. Like formula  (4-25), they depend
                          on the Bhattacharyya  distance. The bibliography section includes references where
                          these  formulas for  two  classes  with  normal  distributions  can  be  found,  namely
                          Foley (1972) and Raudys and Pikelis (1980). A software tool, PR Size, computing
                           these formulas for the linear discriminant case is included  in  the  CD distributed
                           with the book. PR Size also allows the computation of confidence intervals of these
                           estimates, using (4-30).
                             Figure 4.26  is obtained with PRSize and illustrates  how the expected  values of
                           the  error  estimates  evolve  with  n  patterns  (assumed  here  to  be  the  number  of
                           patterns in  each class), in the  situation  of  equal covariance. Both curves have an
                           asymptotic  behaviour  with  n + ooh, with  the  average design  set error  estimate
                           converging to  the  Bayes  error  (related  to  the  optimal  risk)  from  below  and  the
                           average test set error estimate converging from above.
                             Both  standard deviations, which  can  be  inspected in  text  boxes for a  selected
                           value of  n/d, are initially high for low values of n and converge slowly to zero with
                           n + 00. For the situation shown in Figure 4.26, the standard deviation of  Fed (n)
                           changes from  0.089  for n=d (14 patterns,  7  per  class)  to 0.033  for n=lOd (140
                           patterns, 70 per class).
                             Based on the behaviour of the E[ Fed (n)] and E[ Pee, (n)] curves some criteria
                           can be established  for the dimensionality ratio.  As a general rule of  thumb, using
                           dimensionality ratios above 3 is recommended.







                            6
                             Numerical approximations in the computation of the average test set error may  result in a
                             deviation from this asymptotic behaviour, for sufficiently large n.
   115   116   117   118   119   120   121   122   123   124   125