Page 169 - Introduction to Statistical Pattern Recognition

P. 169

4 Parametric Classifiers 151

(1) Positiveness of the y's can be guaranteed if we start with positive
numbers and never decrease their values.

This can be done by modifying r in proportion to
Ar=C+ IC1 (4.90)
instead of C, where
c = uTw -r. (4.91)

Thus, the components of the vector Ar are positive or zero, depending on
whether the corresponding components of C are positive or negative. Thus,
r(t+ 1) is

r(e + 1) = r(e) + p Ar = r(u,) + p(c + IC I) , (4.92)

where p is a properly selected positive constant. In this process, the y's are
always increased at each iterative step, and W is adjusted to reduce the error
between y(2,) and W'Z,. However, one should be reminded that the scale of
y's and, subsequently, the scale of W does not change the essential structure of
the classifier. That is, W'Zi is the same classifier as aWTZj where a is a posi-
tive constant.
(2) On the other hand, there are no restrictions on W. Therefore, for a
given r, we can select W to satisfy &'.2/dW = 0 in (4.88).

1
w = -ur (4.93)
N
or.

= W (Q + -U P AT([) . (4.94)
N

W(Z + 1) minimizes E' for a given r(l + 1) at each iterative step.

In order to see how W converges by this optimization process, let us
study the norm of C. The vector C makes the correction for both r and W.
Also, from (4.91) and (4.87),

164 165 166 167 168 169 170 171 172 173 174