Page 264 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 264

6.3 Bayesian Classification   245

                                                                            ˆ
              Formula 6.28 allows the computation of confidence interval estimates for e ,
                                                                           P
                         ˆ
           by substituting  eP in place of Pe and using the normal distribution approximation
           for sufficiently large n (say, n ≥ 25). Note that formula 6.28 yields zero for the
           extreme cases of  Pe = 0 or Pe = 1.
              In  normal practice, we first compute  P ˆ e by designing and evaluating the
                                                 d
           classifier in the same set with n cases,  eP ˆ  d  ( ) n . This is what we have done so far.
                  ˆ
           As for  eP , we may compute it using an independent set of n cases,  eP ˆ  t  () n . In
                    t
           order to have some guidance on how to choose an appropriate dimensionality ratio,
           we would like to know the deviation of the expected values of these estimates from
           the Bayes error. Here the expectation is computed on a population of classifiers of
           the same type and trained in the same conditions. Formulas for these expectations,
           Ε[ eP ˆ  d  () n ] and  Ε[ eP ˆ  t  () n ], are quite intricate and can only be  computed
           numerically. Like formula 6.25, they depend  on the Bhattacharyya distance. A
           software tool, SC Size  , computing these formulas for two classes with normally
           distributed features and equal covariance  matrices,  separated by a linear
           discriminant, is included  with on the  book CD.  SC Size   also allows the
           computation of confidence intervals of these estimates, using formula 6.28.

















           Figure 6.15. Two-class linear discriminant Ε[ eP ˆ  d  ( ) n ] and Ε[ eP ˆ  t  ( ) n ] curves, for
                      2
           d = 7 and δ  = 3, below and above the dotted line, respectively. The dotted line
           represents the Bayes error (0.193).


              Figure 6.15 is obtained with SC Size   and illustrates how the expected values
           of the  error estimates evolve with the  n/d ratio, where  n is assumed  to be the
           number of cases in each class. The feature set dimension id d = 7. Both curves have
                                           4
           an asymptotic behaviour with  n  →  ∞ , with the average design set error estimate
           converging to the Bayes error from below and the average test set error estimate
           converging from above.



           4
              Numerical approximations in the computation of the average test set error may sometimes
             result in a slight deviation from the asymptotic behaviour, for large n.
   259   260   261   262   263   264   265   266   267   268   269