Page 232 -
P. 232
220 5 Neural Networks
Note that the first condition was reformulated by adding to the cost function
O(w) the term:
This term is proportional to the sum of the penalties, 5, scaled by a parameter C.
The minimization of @(w) imposes an inverse influence on C and 5. For small C
the influence of {is big, i.e., the solution tends to minimize the errors using a small
margin. For large C the influence of 6 in the minimization is small, i.e., there is a
large tolerance to misclassification errors with a tendency to use a wide margin. In
practice, the value of C has to be chosen experimentally since it may have more
than one "optimal" value.
- -
Figure 5.46. Optimal linear discriminant for a non-separable class situation.
The solution of the quadratic programming problem with reformulation (5-100)
is obtained in a similar way to the previous linearly separable problem (5-89). In
fact, the formulation of the dual problem for determination of the Lagrange
multipliers is the same, with the multipliers now satisfying the more restrictive
condition:
Also, formula (5-94) for the weight vector now uses a summation for the
support vectors alone: