Page 169 - Introduction to Statistical Pattern Recognition
P. 169
4 Parametric Classifiers 151
(1) Positiveness of the y's can be guaranteed if we start with positive
numbers and never decrease their values.
This can be done by modifying r in proportion to
Ar=C+ IC1 (4.90)
instead of C, where
c = uTw -r. (4.91)
Thus, the components of the vector Ar are positive or zero, depending on
whether the corresponding components of C are positive or negative. Thus,
r(t+ 1) is
r(e + 1) = r(e) + p Ar = r(u,) + p(c + IC I) , (4.92)
where p is a properly selected positive constant. In this process, the y's are
always increased at each iterative step, and W is adjusted to reduce the error
between y(2,) and W'Z,. However, one should be reminded that the scale of
y's and, subsequently, the scale of W does not change the essential structure of
the classifier. That is, W'Zi is the same classifier as aWTZj where a is a posi-
tive constant.
(2) On the other hand, there are no restrictions on W. Therefore, for a
given r, we can select W to satisfy &'.2/dW = 0 in (4.88).
1
w = -ur (4.93)
N
or.
= W (Q + -U P AT([) . (4.94)
N
W(Z + 1) minimizes E' for a given r(l + 1) at each iterative step.
In order to see how W converges by this optimization process, let us
study the norm of C. The vector C makes the correction for both r and W.
Also, from (4.91) and (4.87),