Page 232 -

P. 232

220 5 Neural Networks

Note that the first condition was reformulated by adding to the cost function
O(w) the term:

This term is proportional to the sum of the penalties, 5, scaled by a parameter C.
The minimization of @(w) imposes an inverse influence on C and 5. For small C
the influence of {is big, i.e., the solution tends to minimize the errors using a small
margin. For large C the influence of 6 in the minimization is small, i.e., there is a
large tolerance to misclassification errors with a tendency to use a wide margin. In
practice, the value of C has to be chosen experimentally since it may have more
than one "optimal" value.

- -
Figure 5.46. Optimal linear discriminant for a non-separable class situation.

The solution of the quadratic programming problem with reformulation (5-100)
is obtained in a similar way to the previous linearly separable problem (5-89). In
fact, the formulation of the dual problem for determination of the Lagrange
multipliers is the same, with the multipliers now satisfying the more restrictive
condition:

Also, formula (5-94) for the weight vector now uses a summation for the
support vectors alone:

227 228 229 230 231 232 233 234 235 236 237