Page 90 - Introduction to Statistical Pattern Recognition

P. 90

72 Introduction to Statistical Pattern Recognition

(3.63)

Likewise, the variance can be computed as

(3.64)
When the w2-distribution is normal,

E { (d2)2 1% ] = E ((X-M)7(X-M)(X-M)T(X-M) I O, )
+ 4 M'E ( (X-M>(X-Mf 1% }M

+ (MTM)2 + 2 E{ (X-M)T(X-M) I02 )MTM
n n
+
+
= 3~h? ~~hihj 4zhim;
i=l itj i=l
(3.65)

where mi is the ith component of M. Subtracting E2(d21q) of (3.63), we
obtain

(3.66)

Example 8: For Data I-I with n variables, hi = 1. Therefore,
E(d210,) =n and Var{d2101] =2n, (3.67)

E(d21w) =n +M'M and Var(d2102] =2n +4M7M. (3.68)

If we assume normal distributions for d2, we can design the Bayes classifier
and compute the Bayes error in the d-space, E~. The normality assumption for
d2 is reasonable for high-dimensional data because d2 is the summation of n
terms as seen in (3.51), and the central limit theorem can be applied. The &d is
determined by n and MTM, while MTM specifies the Bayes error in the X-
space, In order to show how much classification information is lost by
mapping the n-dimensional X into the one-dimensional d2, the relation between

85 86 87 88 89 90 91 92 93 94 95