Page 162 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 162

NONPARAMETRIC LEARNING                                       151

            with class ! k that fall within the i-th bin. Then the probability density
            within the i-th bin is estimated as:

                                        N k;i
                         ^ p pðzj! k Þ¼             with z 2 R i       ð5:24Þ
                                  VolumeðR i Þ  N k

            For each class, the number N k,i has a multinomial distribution with
            parameters

                              Z
                        P k;i ¼   pðzj! k Þdz  with  i ¼ 1; .. . ; N bin
                               z2R i
            where N bin is the number of bins. The statistical properties of ^ p(zj! k )
                                                                      p
            follows from arguments that are identical to those used in Section 5.2.5.
            In fact, if we quantize the measurement vector to, for instance, the
            nearest centre of gravity of the bins, we end up in a situation similar to
            the one of Section 5.2.5. The conclusion is that histogramming works
            fine if the number of samples within each bin is sufficiently large. With a
            given size of the training set, the size of the bins must be large enough to
            assure a minimum number of samples per bin. Hence, with a small
            training set, or a large dimension of the measurement space, the resolu-
            tion of the estimation will be very poor.
              Parzen estimation can be considered as a refinement of histogram-
            ming. The first step in the development of the estimator is to consider
            only one sample from the training set. Suppose that z j 2 T k . Then, we
            are certain that at this position in the measurement space the density is
            nonzero, i.e. p(z j j! k ) 6¼ 0. Under the assumption that p(zj! k ) is contin-
            uous over the entire measurement space it follows that in a small
            neighbourhood of z j the density is likely to be nonzero too. However,
            the further we move away from z j , the less we can say about p(zj! k ). The
            basic idea behind Parzen estimation is that the knowledge gained by the
            observation of z j is represented by a function positioned at z j and with an
            influence restricted to a small vicinity of z j . Such a function is called the
            kernel of the estimator. It represents the contribution of z j to the esti-
            mate. Summing together the contributions of all vectors in the training
            set yields the final estimate.
              Let  (z, z j ) be a distance measure (Appendix A.2) defined in the meas-
            urement space. The knowledge gained by the observation z j 2 T k is
                                                                     þ
            represented by the kernel h( (z, z j )) where h(   ) is a function R ! R  þ
            such that h( (z, z j )) has its maximum at z ¼ z j , i.e. at  (z, z j ) ¼ 0. Further-
            more, h( (   ,   )) must be monotonically decreasing as  (   ,   ) increases,
   157   158   159   160   161   162   163   164   165   166   167