Page 298 - Introduction to Statistical Pattern Recognition

P. 298

280 Introduction to Statistical Pattern Recognition

experimental result
- 1st order approx.
~NN
ONN
10 0.5

8 0.4

6 0.3

4 0.2

2 0.1

0 0.0
1 2 3 4 5 6
Fig. 6-2 Effect of location on the NN distance.

because Var(d) =ExE(d2(X)) - [E, E(d(X))I2 EO if T(x+6)/T(x) Ex6 can
be used as an approximation. So, all dNN(X) are close to the expected value.
As is expected from (6.108), E ( dNN(X)) does not change much from small S to
large e. The marginal density, p (t), is also plotted in Fig. 6-2.

Intrinsic Dimensionality

Whenever we are confronted with high-dimensional data sets, it is usu-
ally advantageous for us to discover or impose some structure on the data.
Therefore, we might assume that the generation of the data is governed by a
certain number of underlying parameters. The minimum number of parameters
required to account for the observed properties of the data, n,, is called the
intrinsic or effective dimensionality of the data set, or, equivalently, the data
generating process. That is, when n random variables are functions of ne vari-
ables such as xi = gi(yl, . . . ,y,J (i = 1, . . . ,n), the intrinsic dimensionality of
the X-space is n,. The geometric interpretation is that the entire data set lies
on a topological hypersurface of n,-dimension.

293 294 295 296 297 298 299 300 301 302 303