Page 169 - Introduction to Statistical Pattern Recognition
P. 169

4  Parametric Classifiers                                     151



                         (1)  Positiveness of  the y's  can be  guaranteed if  we  start  with  positive
                    numbers and never decrease their values.

                         This can be done by modifying r in proportion to
                                              Ar=C+ IC1                         (4.90)
                    instead of  C, where
                                              c = uTw -r.                       (4.91)

                    Thus,  the  components of  the  vector Ar  are  positive  or  zero,  depending  on
                    whether the  corresponding components of  C  are  positive  or  negative.  Thus,
                    r(t+ 1)  is

                                  r(e + 1) = r(e) + p Ar = r(u,) + p(c + IC I) ,   (4.92)

                    where  p  is  a properly selected positive constant.  In  this  process, the y's  are
                    always increased at each iterative step, and W  is  adjusted to reduce the  error
                    between y(2,)  and W'Z,.  However, one should be reminded that the  scale of
                    y's  and, subsequently, the scale of  W does not change the essential structure of
                    the classifier.  That is, W'Zi  is the same classifier as aWTZj where a is a posi-
                    tive constant.
                         (2)  On  the  other hand, there are no restrictions on  W.  Therefore, for a
                    given r, we can select W to satisfy &'.2/dW  = 0 in (4.88).

                                                     1
                                                w = -ur                         (4.93)
                                                    N
                    or.






                                       = W (Q + -U P  AT([)  .                  (4.94)
                                                N

                    W(Z + 1) minimizes E'  for a given r(l + 1) at each iterative step.

                         In  order to  see  how  W  converges by  this  optimization process, let  us
                    study the norm  of  C.  The vector C  makes the  correction for both r and  W.
                    Also, from (4.91) and (4.87),
   164   165   166   167   168   169   170   171   172   173   174