Page 168 - Introduction to Statistical Pattern Recognition

P. 168

150 Introduction to Statistical Pattern Recognition

(d) L(S) is a penalty vector whose components are functions of the
corresponding components of r(l).
A different approach is to treat the problem of finding a feasible solution
of (4.73) as a linear programming problem with an artificially created cost vec-
tor. For this approach, it is suggested that the reader refers to a text in linear
programming.
A word of caution is in order here. In addition to its complexity, all of
the above approaches have a more fundamental disadvantage. For examples of
(4.82) and (4.83), the classifier is designed, based only on the misclassified
samples in the boundary region. For a good classifier the number of the
misclassified samples tends to be small, and sometimes it is questionable
whether these samples represent the true statistics of the boundary structure.
As the result, the resubstitution error, using the same sample set for both
design and test, tends to be severely biased toward the optimistic side. There-
fore, it is advisable that independent samples always be used to test the perfor-
mance of the classifier.

An iterative process and its convergence: In order to see how the itera-
tive process works, let us consider the third criterion of (4.76), in which y(Zj)
are adjusted along with W under the constraint y(Zj> > 0. Also, let us assume
that our coordinate system has already been transformed to whiten the sample
covariance matrix, such that

UU' = NI . (4.86)
Since the result of the procedure should not depend on the coordinate system,
this transformation simplifies the discussion without loss of generality. Then
the mean-square error becomes

The gradients of E* with respect to W and r are

-- - 2(W - -un, (4.88)
1
aw N

(4.89)

In order to satisfy the constraint y(Zi) > 0, a modification is made as follows:

163 164 165 166 167 168 169 170 171 172 173