Page 114 - Introduction to Statistical Pattern Recognition

P. 114

96 Introduction to Statistical Pattern Recognition

1 lCll
+-ln-- t. (3.140)
2 1x21

Or, after simultaneous diagonalization from Z1 and X2 to I and A by Y = A'X,

(3.141)

1"
2
E(h(Y)Iq) = -x[(hj-l) + (d2;-dli) + In -1 I - r , (3.142)
2 i=l hi
where dki is the ith component of Dp = ATMp.
An interesting property emerges from (3.141) and (3.142). That is, if
t = 0,

E(h(Y)lwl) 10 and E(h(Y)lw2] 20 (3.143)

regardless of the distributions of X. These inequalities may be proved by using
In x I x-1. From (3.141), (l-l/k;) + In (1/3Li) IO and -(d2;-d;i)2/ki IO yield
E(h(Y)lw, IO for t = 0. Also, from (3.142), (kj-l)- In hi 2 0 and
}
(d2i-dli)2 2OyieIdE(h(Y)ly} 2Ofort=O.

Variance of h(X): The computation of the variance is more involved.
Therefore, only the results for normal distributions are presented here. The
reader is encouraged to confirm these results. It is suggested to work in the Y-
space where the two covariances are diagonalized to I and A.

109 110 111 112 113 114 115 116 117 118 119