Page 162 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 162
NONPARAMETRIC LEARNING 151
with class ! k that fall within the i-th bin. Then the probability density
within the i-th bin is estimated as:
N k;i
^ p pðzj! k Þ¼ with z 2 R i ð5:24Þ
VolumeðR i Þ N k
For each class, the number N k,i has a multinomial distribution with
parameters
Z
P k;i ¼ pðzj! k Þdz with i ¼ 1; .. . ; N bin
z2R i
where N bin is the number of bins. The statistical properties of ^ p(zj! k )
p
follows from arguments that are identical to those used in Section 5.2.5.
In fact, if we quantize the measurement vector to, for instance, the
nearest centre of gravity of the bins, we end up in a situation similar to
the one of Section 5.2.5. The conclusion is that histogramming works
fine if the number of samples within each bin is sufficiently large. With a
given size of the training set, the size of the bins must be large enough to
assure a minimum number of samples per bin. Hence, with a small
training set, or a large dimension of the measurement space, the resolu-
tion of the estimation will be very poor.
Parzen estimation can be considered as a refinement of histogram-
ming. The first step in the development of the estimator is to consider
only one sample from the training set. Suppose that z j 2 T k . Then, we
are certain that at this position in the measurement space the density is
nonzero, i.e. p(z j j! k ) 6¼ 0. Under the assumption that p(zj! k ) is contin-
uous over the entire measurement space it follows that in a small
neighbourhood of z j the density is likely to be nonzero too. However,
the further we move away from z j , the less we can say about p(zj! k ). The
basic idea behind Parzen estimation is that the knowledge gained by the
observation of z j is represented by a function positioned at z j and with an
influence restricted to a small vicinity of z j . Such a function is called the
kernel of the estimator. It represents the contribution of z j to the esti-
mate. Summing together the contributions of all vectors in the training
set yields the final estimate.
Let (z, z j ) be a distance measure (Appendix A.2) defined in the meas-
urement space. The knowledge gained by the observation z j 2 T k is
þ
represented by the kernel h( (z, z j )) where h( ) is a function R ! R þ
such that h( (z, z j )) has its maximum at z ¼ z j , i.e. at (z, z j ) ¼ 0. Further-
more, h( ( , )) must be monotonically decreasing as ( , ) increases,