Page 110 -
P. 110

4.2 Bayesian Classification   97

                             Let us use the training set estimates of these errors, Pe12=0. 1 and Pe2,=0.46 (see
                           Figure 4.17). The average risk per cork stopper is now computed as R = 0.015PeI2
                           + O.OIPezl = 0.0061 €.  If we had not used the adjusted prevalences we would have
                           obtained the higher risk of 0.0063 €.
                             For a set of classes, Q, formula (4-20) generalizes to:






                             The Bayes decision rule is not the only alternative in statistical classification. It
                           is, however, by far the most popular rule. The interested reader can find alternative
                           methods to the Bayes rule for accounting for action losses in Fukunaga (1990).
                             Note  also  that,  in  practice,  one  tries  to  minimize  the  average  risk  by  using
                           estimates of pdfs computed from a training set, as we have done above for the cork
                           stoppers. If  we have grounds to believe that the pdfs  satisfy a certain parametric
                           model, we can instead compute the appropriate parameters from the training set, as
                           discussed next. In either case, we are using an empirical risk minimization (ERM)
                           principle: minimization of an empirical risk instead of a true risk.


                            4.2.2  Normal Bayesian Classification

                            Until  now  we have assumed no particular distribution model for the likelihoods.
                            Frequently, however,  the  normal  model  is  a  reasonable  assumption.  The  wide
                            spread  applicability of  the  normal  model  has  a justification  related  to  the  well-
                            known  Central  Limit Theorem,  according to  which  the sum of  independent  and
                            identically  distributed  random  variables  has  (under  very  general  conditions)  a
                            distribution converging to the normal law with an increasing number of  variables.
                            In  practice,  one frequently obtains good approximations to the  normal law, even
                            for  a relatively  small  number  of  added  random  variables  (say,  above  5).  For
                            features that can be considered the result of the addition of independent variables,
                            the normal assumption is often an acceptable one.
                              A normal likelihood for class ui is expressed by the following pdj






                            with:

                               pi = Ei [XI;             mean vector for class u;        (4-2 la)
                               Zi = Ei [(x  - pi )(x  - p, )'I   covariance for class mi  .   (4-2 1 b)


                              Note that pi and C,,  the distribution parameters, are the theoretical or true mean
                             and covariance, whereas until now we have used the sample estimates mi and C,,
   105   106   107   108   109   110   111   112   113   114   115