Page 280 - Introduction to Statistical Pattern Recognition

P. 280

262 Introduction to Statistical Pattern Recognition

IMSE = JMSE { P(X) IdX . (6.33)

Another possible criterion to obtain the globally optimal r is
Ex{ MSE { &X)) } = jMSE{ fi(X))p (X)dX. The optimization of this criterion
can be carried out in a similar way as the IMSE, and produces a similar but a
slightly smaller I' than the IMSE. This criterion places more weight on the
MSE in high density areas, where the locally optimal r's tend to be smaller.
n
Since we have computed the bias and variance of p(X) in (6.18) and
(6.19), MSE{;(X)] may be expressed as

.
MSE{~%X)J = [E(P(x)I - p(x)12 + var{i(x)~ (6.34)
In this section, only the uniform kernel function is considered. This is
because the Parzen density estimate with the uniform kernel is more directly
related to the k nearest neighbor density estimate, and the comparison of these
two is easier. Since both normal and uniform kernels share similar first and
second order moments of fi(X), the normal kernel function may be treated in
the same way as the uniform kernel, and both produce similar results.
When the first order approximation is used, &X) is unbiased as in (6.18),
and therefore MSE = Var = pINv - p21N as in (6.29). This criterion value is
minimized by selecting v = m for a given N and p. That is, as long as the den-
sity function is linear in L(X), the variance dominates the MSE of the density
estimate, and can be reduced by selecting larger v. However, as soon as L(X)
is expanded and picks up the second order term of (6.10), the bias starts to
appear in the MSE and it grows with r2 (or v21n) as in (6.18). Therefore, in
minimizing the MSE, we select the best compromise between the bias and the
variance. In order to include the effect of the bias in our discussion, we have
no choice but to seiect the second order approximation in (6.18). Otherwise,
the MSE criterion does not depend on the bias term. On the other hand, the
variance term is included in the MSE no matter which approximation of (6.19)
is used, the first or second order. If the second order approximation is used,
the accuracy of the variance may be improved. However, the degree of
improvement may not warrant the extra complexity which the second order
approximation brings in. Furthermore, it should be remembered that the
optimal I' will be a function of p(X). Since we never know the true value of

275 276 277 278 279 280 281 282 283 284 285