Page 285 - Introduction to Statistical Pattern Recognition

P. 285

6 Nonparametric Density Estimation 267

(6.60)

or
n
P
hi + hk(Chi) = - (k = 1,. . . ,n) . (6.61)
o2
i=l
In order to satisfy (6.611, all hi's must be equal. Since IA I = 1, the solution
of (6.61) must be

A=I. (6.62)
That is, in the transformed Z-space, the optimal matrix A, is I for B, = 1.
Therefore, the optimal matrix A to use in the original X-space is identical to B
of (6.51) [5]. The neighborhoods should take the same ellipsoidal shape as the
underlying distribution. For the normal distribution we see that the covariance
matrix B = C is indeed optimal for A.
It is important to notice that (6.62) is the locally optimal metric regard-
less of the location, because IMSE* of (6.54) is minimized not after but before
taking the integration. The same result can be obtained by minimizing MSE *
of (6.38).

Normal Case

In order to get an idea of what kind of numbers should be used for I-, in
this section let us compute the optimal I' for a normal distribution. The partial
derivatives Vp (X) and V2p (X) for Nx(M, C) are

Vp(X) = - p (x)C-'(x-M) , (6.63)

.
v2p (X) = p (X)[C-' (X-M)(X-M)'C-' - C-'] (6.64)
For the simplest case in which M = 0 and I: = I,

tr{V'p(X)) =~(x)(xTx - n) =P(x)(& - n) . (6.65)
r=l
Note that the optimal A is also I in this case. It is easy to show that, if
p (X) = Nx(OJ), then p2(X) = 2-"'2(2n)-"'2NX(0,1/2). Therefore,

280 281 282 283 284 285 286 287 288 289 290