Page 130 - Introduction to Statistical Pattern Recognition

P. 130

112 Introduction to Statistical Pattern Recognition

(3.179)

where the inequalities are derived from In x I x - 1. The equalities in (3.178)
and (3.179) hold only when pl(X) = p2(X).
Thus, as m increases, E { s I o1 decreases and E { s I o2 increases in pro-
}
}
portion to m, while the standard deviations increase in proportion to &. This
is true regardless of p I (X) and p2(X) as long as p I (X) # p2(X). Therefore, the
density functions of s for o1 and o2 become more separable as m increases.
Also, by the central limit theorem, the density function of s tends toward a nor-
mal distribution for large m.

Example 16: In order to see the effect of m easily, let us study a sim-
ple example in which h (X) is distributed as Nh(-q, 1) for w, and Nh(+q, 1) for
02. Then, s is distributed as N,(-rnq,m) for o1 and N,(+mT,m) for 02.
Therefore, the Bayes error of the sequential classifier for P I = P2 = 0.5 is

(3.180)

where @(.) is the normal error function. Figure 3-22 shows the relation
between E and m for various q.
In practice, the pi(X)’s are not known, and the Bayes classifier is hard to
design. Therefore, in place of the Bayes classifier, some classifiers such as the
quadratic classifier of (3.1 1) and the linear classifier of (3.12) are often used.
These two classifiers satisfy

E(h(X)lwl} 10 and E{h(X)Io2} 20 (3.181)

regardless of the distributions of X as shown in (3.143), (3.97), and (3.98)
respectively. Note here that (3.97) and (3.98) can be derived from (3.96)
regardless of the selection of C. Therefore, by increasing m, we can make the
errors of these classifiers as small as we like. However, note from (3.97) and
(3.98) that E { h (X) o1 = E (h (X) 1 = 0 for MI Mz. Therefore, when
I
=
}
1%
=
MI M2, we cannot use the linear classifier of (3.12) for sequential operation.

125 126 127 128 129 130 131 132 133 134 135