Page 167 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 167
156 SUPERVISED LEARNING
Nearest neighbour estimation is a method that implements such a
refinement. The method is based on the following observation. Let
R(z) R N be a hypersphere with volume V. The centre of R(z)is z.If
the number of samples in the training set T k is N k , then the probability of
having exactly n samples within R(z) has a binomial distribution with
expectation:
Z
E½n¼ N k pðyj! k Þdy N k Vpðzj! k Þ ð5:27Þ
y2RðzÞ
Suppose that the radius of the sphere around z is selected such that this
sphere contains exactly samples. It is obvious that this radius depends
on the position z in the measurement space. Therefore, the volume will
depend on z. We have to write V(z) instead of V. With that, an estimate
of the density is:
^ p pðzj! k Þ¼ ð5:28Þ
N k VðzÞ
The expression shows that in regions where p(zj! k ) is large, the volume
is expected to be small. This is similar to having a small interpolation
zone. If, on the other hand, p(zj! k ) is small, the sphere needs to grow in
order to collect the required samples.
The parameter controls the balance between the bias and variance.
This is like the parameter h in Parzen estimation. The choice of should
be such that:
!1 as N k !1 in order to obtain a low variance
ð5:29Þ
=N k ! 0as N k !1 in order to obtain a low bias
p
A suitable choice is to make proportional to N k .
ffiffiffiffiffiffiffi
Nearest neighbour estimation is of practical interest because it paves
the way to a classification technique that directly uses the training set,
i.e. without explicitly estimating probability densities. The develop-
ment of this technique is as follows. We consider the entire training
set and use the representation T S as in (5.1). The total number of
samples is N S . Estimates of the prior probabilities follow from (5.18):
^
P P(! k ) ¼ N k /N S .
N
As before, let R(z) R be a hypersphere with volume V(z). In order
to classify a vector z we select the radius of the sphere around z such that
this sphere contains exactly samples taken from T S . These samples are