Page 229 -
P. 229

5.10 Support Vector Machines   217


                                 Hence, when maximizing the separation margin r, we are in fact minimizing the
                               Vapnik-Chervonenkis dimension and, therefore, minimizing the structural risk. As
                               can be seen from expression (5-63), this is particularly beneficial for a small value
                               of  nldyc  (say  below  20).  Also  note  that  the  complexity of  the  SVM approach
                               depends  on  the  number  of  support  vectors  used,  and  is  independent  of  the
                               problem's dimensionality (for more details see e.g. Cherkassky and Mulier, 1998).
                                 For the canonical hyperplane the norm of the weight vector controls the margin
                               of  separation.  The  SVM  approach  of  maximizing  the  margin  of  separation
                               (equivalent to minimizing the norm of the weight vector) can be expressed then as
                               the following constrained optimisation problem:
                                   minimize  (D(w)= +IIwII   2  ,

                                   subjectto  t,(w'xi+wo)21,  i=l;..,n

                                 Notice  that  the  minimization  of  the  cost  function  O(w) will  assure  the
                               minimization of the norm of the weight vector, with the advantage that @(w) being
                               a quadratic function, there is a standard method for solving this so-called quadratic
                               programming problem.  This standard method  is called the Lagrange  multipIiers
                               method, and consists of computing the saddle point of the Lagrangian function:






                                minimizing  the  weights  plus  bias  and  maximizing  the  nonnegative  Lagrange
                                multipliers ai.
                                  If  we  differentiate  the  Lagrangian  function  with  respect  to  w  and  wo,  the
                                following optimality conditions are derived:










                                  In  order  to  compute  the  weights  all  we  need  now  are  the  values  of  the
                                Lagrangian multipliers. When we expand the expression of J(w.wo,a) and use the
                                results (5-91), an expression depending only on a is obtained:






                                  The  extremal  problems  (5-89)  and  (5-93)  are  known  as  primal  and  dual
                                problems, respectively. The dual problem consists of  maximizing Q(a), yielding
                                nonnegative  Lagrange  multipliers  that  satisfy  condition  (5-92).  Notice  that  the
   224   225   226   227   228   229   230   231   232   233   234