Page 175 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 175
164 SUPERVISED LEARNING
Strategies to adjust the parameters may be further categorized into ‘itera-
tive’ and ‘non-iterative’. Non-iterative schemes are found when the perfor-
mance measure allows for an analytic solution of the optimization. For
instance, suppose that the set of parameters is denoted by w and that the
performance measure is a continuous function J(w)of w. The optimal solu-
tion is one which maximizes J(w). Hence, the solution must satisfy qJ(w) ¼ 0.
qw
In iterative strategies the procedure to find a solution is numerical.
Samples from the training set are fed into the decision function. The
classes found are compared with the true classes. The result controls the
adjustment of the parameters. The adjustment is in a direction which
improves the performance. By repeating this procedure it is hoped that
the parameters iterate towards the optimal solution.
The most popular search strategy is the gradient ascent method (also
1
called steepest ascent) . Suppose that the performance measure J(w)isa
continuous function of the parameters contained in w. Furthermore,
qJ(w)
suppose that rJ(w) ¼ is the gradient vector. Then the gradient
qw
ascent method updates the parameters according to:
wði þ 1Þ¼ wðiÞþ ðiÞrJ wðiÞÞ ð5:40Þ
ð
where w(i) is the parameter obtained in the i-th iteration. (i)is the
so-called learning rate.If (i) is selected too small, the process converges
very slowly, but if it is too large, the process may overshoot the maximum,
or oscillate near the maximum. Hence, a compromise must be found.
Different choices of the performance measures and different search
strategies lead to a multitude of different learning methods. This section
confines itself to two-class problems. From the many iterative, gradient-
based methods we only discuss ‘perceptron learning’ and the ‘least
squared error learning’. Perhaps the practical significance of these two
methods is not very large, but they are introductory to the more involved
techniques of succeeding sections.
Perceptron learning
In a two-class problem, the decision function expressed in (5.35) is
equivalent to a test g 1 (y) g 2 (y) > 0. If the test fails, it is decided for
! 2 , otherwise for ! 1 . The test can be accomplished equally well with a
single linear function:
1
Equivalently, we define J(w) as an error measure. A gradient descent method should be
applied to minimize it.