Page 150 - Introduction to Statistical Pattern Recognition
P. 150
132 Introduction to Statistical Pattern Recognition
01
h(X) = vTx + v, >< 0 (4.18)
o?
The term h(X) is a linear function of X and is called a linear- discriminant
function. Our design work is to find the optimum coefficients V = [VI . . . v,lT
and the threshold value v,, for given distributions under various criteria. The
linear discriminant function becomes the minus-log likelihood ratio when the
given distributions are normal with equal covariance matrices.
However, the reader should be cautioned that no linear classifiers work
well for the distributions which are not separated by the mean-difference but
separated by the covariance-difference. In this case, we have no choice but to
adopt a more complex classifier such as a quadratic one. The first and second
terms of the Bhattacharyya distance, (3.152), will indicate where the class
separability comes from, namely mean- or covariance-difference.
Optimum Design Procedure
Equation (4.18) indicates that an n-dimensional vector X is projected
onto a vector V, and that the variable, y = VTX, in the projected one-
dimensional h-space is classified to either o1 or %, depending on whether
y < -v, or y > -v,,. Figure 4-7 shows an example in which distributions are
projected onto two vectors, V and V’. On each mapped space, the threshold,
vo, is chosen to separate the wI- and @*-regions, resulting in the hatched error
probability. As seen in Fig. 4-7, the error on V is smaller than that on V’.
Therefore, the optimum design procedure for a linear classifier is to select V
and v, which give the smallest error in the projected h-space.
When X is normally distributed, h (X) of (4.18) is also normal. There-
fore, the error in the h-space is determined by qi = E{h(X) loi] and
0’ = Var(h(X)Ioi), which are functions of V and v,. Thus, as will be dis-
cussed later, the error may be minimized with respect to V and v,,. Even if X
is not normally distributed, h (X) could be close to normal for large n, because
h (X) is the summation of n terms and the central limit theorem may come into
effect. In this case, a function of qi and 0’ could be a reasonable criterion to
measure the class separability in the h-space.