Page 335 - Introduction to Statistical Pattern Recognition

P. 335

7 Nonparametric Classification and Error Estimation 317

Effect of dimensionality: The dimensionality of the data appears to play
an important role in determining the relationship between the size of the bias
and the sample size. As is shown in Fig. 7-5, for small values of n (say, n I
4), changing the sample size is an effective means of reducing the bias. For
larger values of n, however, increasing the number of samples becomes a more
and more futile means of improving the estimate. It is in these higher dimen-
sional cases that improved techniques of accurately estimating the Bayes error
are needed. It should be pointed out that, in the expression for the bias of the
NN error, n represents the local or intrinsic dimensionality of the data as dis-
cussed in Chapter 6. In many applications, the intrinsic dimensionality is
much smaller than the dimensionality of the observation space. Therefore, in
order to calculate PI, it is necessary that the intrinsic dimensionality be
estimated from the data using (6.1 15).

Effect of densities: The expectation term of (7.35) gives the effect of
densities on the size of the bias. In general, it is very hard to determine the
effect of this term because of its complexity. In order to investigate the gen-
eral trends, however, we can compute the term numerically for a normal case.

Experiment 2: Computation of Ex . ) of (7.35)
(
Data: I-I (Normal)
M adjusted to give E* = 2, 5, 10, 20, 30(%)
Dimensionality: n = 2, 4, 8, 16
Sample size: N I = N2 = 1600n
Metric: A = I (Euclidean)
Results: Table 7-2 [SI

In the experiment, B of (7.36) was evaluated at each generateL sample point
where the mathematical formulas based on the normality assumption were used
to compute p(X) and qi(X). The expectation of (7.35) was replaced by the
sample mean taken over 160011 samples per class.
Table 7-2 reveals many properties of the expectation term. But, special
attention must be paid to the fact that, once n becomes large (n > 4), its value
has little effect on the size of the expectation. This implies that PI of (7.37)
dominates the effect of n on the bias. That is, the bias is much larger for
high-dimensions. This coincides with the observation that, in practice, the NN
error comes down, contrary to theoretical expectation, by selecting a smaller

330 331 332 333 334 335 336 337 338 339 340