Page 296 - Introduction to Statistical Pattern Recognition

P. 296

27 8 Introduction to Statistical Pattern Recognition

(6.1 13)

(b) Uniform:

The reader may check that Tl” 2’12 (l-lh~)-~’~ for normal and (n+2)”’ for
uniform with m = 1 are close for a large n (10.3 and 10.1 respectively for
n = 100).

Effect of parameters: Let us examine (6.108) and (6.109) with m = 1.
These equations reveal how E { dkNN(X)) is affected by such parameters as n, k,
N, and p(X). The effect of k appears only in the second term of (6.109).
When m = 1 and n is large, T(k+I/n)lT(k) E k““ is close to 1 regardless of the
value of k. This means that the average distance to the first NN is almost the
same as the average distance to the second NN, and so on. The effect of N,
which appears in the third term of (6.109), is also minimal, since
T(N+l)lT(N+l+lln) N-”” 1 for large n. The effect of the location is
observed as p-””(X) in (6.108). When n is large, p-’”’(X) Z 1 regardless of
the value of p(X) unless p(X) is either extremely large or small. Thus,
E ( dkNN(X)] is highly influenced only by n and I E I. On the other hand, in the
global kNN distance ExE{dkNN(X)), the I1 I term in v cancels with the I Z I of
Ex{p-””(X)), and only n determines the averaged kNN distance. This is true
because the distances are normalized by C as in (6.24). Table 6-3 shows
ExE(dkNN(X)} for various n, k, and N for normal and uniform distributions
with covariance matrix I [18]. The parameter values are n = 10, k = 1, and
N = 100, unless otherwise indicated. It can be observed from Table 6-3 that
the effects of k and N are not significant for n = 10. This behavior is even
more apparent for higher-dimensions.

Although the above results are contrary to our intuition, they could be
better understood by observing the volume to the kth NN, v~NN, instead of the
distance. For example, dzNN/dNN = 2.551l2.319 El.1 and is close to 1 for a
10-dimensional normal distribution from Table 6-3. However, the ratio of the
corresponding volumes is V~NNIVNN = (~~NN/~NN)’O E2.6, which is not close to
1. That is, the effect of k on v~NN is significant. The same is true for the

291 292 293 294 295 296 297 298 299 300 301