Page 175 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 175

164                                        SUPERVISED LEARNING

              Strategies to adjust the parameters may be further categorized into ‘itera-
            tive’ and ‘non-iterative’. Non-iterative schemes are found when the perfor-
            mance measure allows for an analytic solution of the optimization. For
            instance, suppose that the set of parameters is denoted by w and that the
            performance measure is a continuous function J(w)of w. The optimal solu-
            tion is one which maximizes J(w). Hence, the solution must satisfy  qJ(w)  ¼ 0.
                                                                     qw
              In iterative strategies the procedure to find a solution is numerical.
            Samples from the training set are fed into the decision function. The
            classes found are compared with the true classes. The result controls the
            adjustment of the parameters. The adjustment is in a direction which
            improves the performance. By repeating this procedure it is hoped that
            the parameters iterate towards the optimal solution.
              The most popular search strategy is the gradient ascent method (also
                               1
            called steepest ascent) . Suppose that the performance measure J(w)isa
            continuous function of the parameters contained in w. Furthermore,
                                 qJ(w)
            suppose that rJ(w) ¼     is the gradient vector. Then the gradient
                                  qw
            ascent method updates the parameters according to:
                              wði þ 1Þ¼ wðiÞþ  ðiÞrJ wðiÞÞ             ð5:40Þ
                                                     ð
            where w(i) is the parameter obtained in the i-th iteration.  (i)is the
            so-called learning rate.If  (i) is selected too small, the process converges
            very slowly, but if it is too large, the process may overshoot the maximum,
            or oscillate near the maximum. Hence, a compromise must be found.
              Different choices of the performance measures and different search
            strategies lead to a multitude of different learning methods. This section
            confines itself to two-class problems. From the many iterative, gradient-
            based methods we only discuss ‘perceptron learning’ and the ‘least
            squared error learning’. Perhaps the practical significance of these two
            methods is not very large, but they are introductory to the more involved
            techniques of succeeding sections.

            Perceptron learning

            In a two-class problem, the decision function expressed in (5.35) is
            equivalent to a test g 1 (y)   g 2 (y) > 0. If the test fails, it is decided for
            ! 2 , otherwise for ! 1 . The test can be accomplished equally well with a
            single linear function:




            1
             Equivalently, we define J(w) as an error measure. A gradient descent method should be
            applied to minimize it.
   170   171   172   173   174   175   176   177   178   179   180