Page 159 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 159

148                                        SUPERVISED LEARNING

            probabilities, we need to specify one extra parameter, N S , which is the
            number of samples.
              Intuitively, the following estimator is appropriate:


                                       ^       N k
                                       P Pð! k Þ¼                      ð5:18Þ
                                               N S

                                                          ^
            The expectation of N k equals N S P(! k ). Therefore, P(! k ) is an unbiased
                                                          P
            estimate of P(! k ). The variance of a multinomial distributed variable is
            N S P(! k )(1   P(! k )). Consequently, the variance of the estimate is:


                                  ^
                                  P
                              Var½Pð! k ފ ¼  Pð! k Þð1   Pð! k ÞÞ     ð5:19Þ
                                                 N S
            This shows that the estimator is consistent. That is if N S !1, then
                ^
            Var[P(! k )] ! 0. The required number of samples follows from the
                P
                          q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
                                ^
                                P
            constraint that  Var[P(! k )] << P(! k ). For instance, if for some class
            we anticipate that P(! k ) ¼ 0:01, and the permitted relative error is 20%,
               q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
                      ^
                     P
            i.e.  Var[P(! k )] ¼ 0:2P(! k ), then N S must be about 2500 in order to
            obtain the required precision.

            5.2.5  Binary measurements

            Another example of a multinomial distribution occurs when the meas-
            urement vector z can only take a finite number of states. For instance,
            if the sensory system is such that each element in the measurement
            vector is binary, i.e. either ‘1’ or ‘0’, then the number of states the
                                       N
            vector can take is at most 2 . Such a binary vector can be replaced
            with an equivalent scalar z that only takes integer values from 1 up to
             N
            2 . The conditional probability density p(zj! k ) turns into a probability
            function P(zj! k ). Let N k (z) be the number of samples in the training
            set with measurement z and class ! k . N k (z) has a multinomial
            distribution.
              At first sight, one would think that estimating P(zj! k ) is the same type
            of problem as estimating the prior probabilities such as discussed in the
            previous section:
   154   155   156   157   158   159   160   161   162   163   164