Page 103 -
P. 103
90 4 Statistical Classification
4.2 Bayesian Classification
In the previous section we presented linear classifiers based solely on the notion of
similarity, which was evaluated as a distance to class prototypes, usually the class
means. We did not assume anything specific regarding the pattern distributions,
mentioning only the fact that the distance metrics used should reflect the shape of
the pattern clusters around the means. As we saw, the Mahalanobis metric takes
care of this aspect through the use of the covariance matrix.
In the present section we will take into account the specific probability
distributions of the patterns in each class. Doing so, we will be able to address two
important issues:
- Is our classifier optimal in any sense?
- How can we adjust our classifier to the specific risks of a classification?
4.2.1 Bayes Rule for Minimum Risk
Let us consider again the cork stoppers problem and imagine that factory
production was restricted to the two classes we have been considering, denoted as:
w,=Super and @=Average. Let us assume further that the factory had a record of
production stocks for a reasonably long period of time, summarized as:
Number of produced cork stoppers of class wl: nl = 901 420
Number of produced cork stoppers of class 02: nz = 1 352 130
Total number of produced cork stoppers: n = 2 253 550
With this information we can readily obtain good estimate: of the probabilities
of producing a cork stopper from either of the two classes, the so-called prior
probabi2ities or prevalences:
Note that the prevalences are not entirely controlled by the factory, they depend
mainly on the quality of the raw material. In the same way, a cardiologist does not
control how prevalent myocardial infarction is in a given population. Prevalences
can, therefore, be regarded as "states of nature".
Suppose we are asked to make a blind decision as to which class a cork stopper
belongs to without looking at it. If the only available information is the
prevalences, the sensible choice is class a. This way, we expect to be wrong only
40% of the times.
1
Deviation from the true probability values is less than 0.0006 with 95% confidence level.