Page 71 - Introduction to Statistical Pattern Recognition

P. 71

3 Hypothesis Testing 53

r-(X) = mink1(X),q2(X)I . (3.4)
The total error, which is called the Bayes error, is computed by E { r(X)].

where

Equation (3.7) shows several ways to express the Bayes error, E. The first line
is the definition of E. The second line is obtained by inserting (3.6) into the
first line and applying the Bayes theorem of (3.2). The integral regions L and
L2 of the third line are the regions where X is classified to o1 and o2 by this
decision rule, and they are called the ol- and o;?-regions. In LI,
P IpI (X) > P 2p2(X), and therefore r (X) = P2p2(X)/p (X). Likewise,
r-(X) = P Ip I (X)/p (X) in L2 because P lp I (X) < P g2(X) in L2. In (3.8), we
distinguish two types of errors: one results from misclassifying samples from
w1 and the other results from misclassifying samples from 02. The total error
is a weighted sum of these errors.

Figure 3-1 shows an example of this decision rule for a simple one-
dimensional case. The decision boundary is set at x=r where
P lp I (x) = P 2p2(x), and s < r and x > t are designated to L I and L2 respec-
tively. The resulting errors are P = R + C, P 2~2 A, and E = A + B + C,
=
where A, B, and C indicate the areas, for example, B = I' P Ip (8) dx.
This decision rule gives the smallest probability of error. This may be
demonstrated easily from the one-dimensional example of Fig. 3- 1. Suppose
that the boundary is moved from r to t', setting up the new wI - and o2-regions
as L; and L;. Then, the resulting errors are P ]E; = C, P 2~i = A + B + D, and
6 =A + B + C + D, which is larger than E by D. The same is true when the

66 67 68 69 70 71 72 73 74 75 76