Page 331 - Introduction to Statistical Pattern Recognition
P. 331
7 Nonparametric Classification and Error Estimation 313
error, and do not satisfy the inequality of (7.19). This is due to the large bias
of this estimation technique, which will be discussed in the next section.
7.3 Voting RNN Procedure-Finite Sample Analysis
So far, the asymptotic performance of the voting kNN procedure has
been studied. That is, the conditional risks and errors were derived based on
the assumption that qi(XkNN) = qi(X). However, in the real world, the number
of samples available is always finite. So, the question is how many samples
are needed so that the observed performance is similar to the asymptotic one
[2-71. In this section, finite sample analysis is presented for the voting NN and
2NN procedures.
Bias of the NN Error
We start our analysis with (7.9). With a finite number of samples, we
can no longer assume qj(XN,) = qj(X). Therefore, we define 6 as the differ-
ence between qi(X) and qi(XNN).
9 1 (XNN) = 4 1 (X) + 6 and ~~(XNN) = q2(X) - 6 . (7.26)
Equation (7.26) holds since ql(X) + q2(X) = 1 and ql(XNN)+ q2(XNN) = 1.
Substituting (7.26) into (7.9),
I', (X,X,N) = I'T (X) + [q*(X) - q 1 (XI16 . (7.27)
Thus, the bias between the finite sample and asymptotic NN errors may be
computed by taking the expectation of the second term of (7.27) with respect
to both XNN and X. In order to accomplish this, ql(X,,) is expanded in a
Taylor series around a given X. Terms higher than second order are discarded
and q (X) is subtracted to obtain
1
6 zV7q I (x)(x,,-x) + Ttr( V2q (x)(x,,-x)(x,,-x)~ 1 . (7.28)
I
The metric used to measure the NN distances is specified by
d2(Y,X) = (Y-X)TA-'(Y-X). In the case that A is held fixed, this is a global
metric. However, in the more general case, A may be allowed to vary with X,
forming a local metric. The same metric is used for both w, and w2 in this
section. However, a similar analysis could be done, even when two different
rnetrics are adopted for w1 and 02.

