Page 232 -
P. 232

220    5 Neural Networks

                             Note  that the  first condition was  reformulated by  adding to the cost function
                           O(w) the term:





                             This term is proportional to the sum of  the penalties, 5, scaled by a parameter C.
                           The minimization of  @(w) imposes an inverse influence on  C and 5. For small C
                           the influence of  {is big, i.e., the solution tends to minimize the errors using a small
                           margin. For large C the influence of 6 in the minimization is small, i.e., there is a
                           large tolerance to misclassification errors with a tendency to use a wide margin. In
                           practice, the value of  C has to be chosen experimentally since it may have more
                           than one "optimal" value.




















                                           -   -
                              Figure 5.46.  Optimal linear discriminant for a non-separable class situation.




                              The solution of the quadratic programming problem with reformulation (5-100)
                           is obtained in a similar way  to the previous linearly separable problem (5-89). In
                           fact,  the  formulation  of  the  dual  problem  for  determination  of  the  Lagrange
                           multipliers is  the  same, with  the  multipliers now  satisfying the  more restrictive
                           condition:




                              Also,  formula  (5-94)  for  the  weight  vector  now  uses  a  summation for the
                            support vectors alone:
   227   228   229   230   231   232   233   234   235   236   237