Page 110 -
P. 110
4.2 Bayesian Classification 97
Let us use the training set estimates of these errors, Pe12=0. 1 and Pe2,=0.46 (see
Figure 4.17). The average risk per cork stopper is now computed as R = 0.015PeI2
+ O.OIPezl = 0.0061 €. If we had not used the adjusted prevalences we would have
obtained the higher risk of 0.0063 €.
For a set of classes, Q, formula (4-20) generalizes to:
The Bayes decision rule is not the only alternative in statistical classification. It
is, however, by far the most popular rule. The interested reader can find alternative
methods to the Bayes rule for accounting for action losses in Fukunaga (1990).
Note also that, in practice, one tries to minimize the average risk by using
estimates of pdfs computed from a training set, as we have done above for the cork
stoppers. If we have grounds to believe that the pdfs satisfy a certain parametric
model, we can instead compute the appropriate parameters from the training set, as
discussed next. In either case, we are using an empirical risk minimization (ERM)
principle: minimization of an empirical risk instead of a true risk.
4.2.2 Normal Bayesian Classification
Until now we have assumed no particular distribution model for the likelihoods.
Frequently, however, the normal model is a reasonable assumption. The wide
spread applicability of the normal model has a justification related to the well-
known Central Limit Theorem, according to which the sum of independent and
identically distributed random variables has (under very general conditions) a
distribution converging to the normal law with an increasing number of variables.
In practice, one frequently obtains good approximations to the normal law, even
for a relatively small number of added random variables (say, above 5). For
features that can be considered the result of the addition of independent variables,
the normal assumption is often an acceptable one.
A normal likelihood for class ui is expressed by the following pdj
with:
pi = Ei [XI; mean vector for class u; (4-2 la)
Zi = Ei [(x - pi )(x - p, )'I covariance for class mi . (4-2 1 b)
Note that pi and C,, the distribution parameters, are the theoretical or true mean
and covariance, whereas until now we have used the sample estimates mi and C,,