Page 231 -
P. 231

5.10 Support Vector Machines   219











                                 which has the following solution in  nonnegative ds: a,=@ a2=l; a3=314; &=114.
                                 Applying (5-91), we determine the optimal weight vector:






                                    Hence, the linear discriminant is a straight line at 45" and the support vectors
                                  are the points x2, x3 and x4 (with non-zero Lagrange multipliers), allowing us to
                                  determine the optimal bias using points x2 and x3:





                                    The canonical  hyperplane  is,  therefore,  d(x) = 3  -  2x1 -  2x2 = 0, satisfying
                                  condition (5-87) for the support vectors.
                                    Concerning the performance of SVM classifiers in the case of linearly separable
                                  classes, the work of Raudy (1997) has shown that the classification error is mainly
                                  determined by  the dimensionality ratio, and that larger separation margins result,
                                  on average, in better generalization.
                                    When  the  classes  are  non-separable,  the  optimal  hyperplane  must  take  into
                                  account  the  deviations  from  the  ideal  separable  situation.  In  the  approach
                                  introduced  by  Cortes  and  Vapnik  (1995), the  conditions  (5-89) for  the
                                  determination of the optimal hyperplane are reformulated as:


                                                          2    "
                                     minimize  @(w)= +11wll  + CZ E~  ,
                                                              i=l
                                     subjectto  t,(w'x,+w,)~l-~~, i=l,..-,n  ,

                                  where the  g  are  nonnegative slack  variables, penalizing the  deviation  of  a data
                                  point from the ideal separable situation.
                                    For a point falling on the right side of the decision region but inside the region
                                  of  separation, the value of  is smaller than one. This is the situation of point xl in
                                  Figure 5.46. For a point falling on the wrong side of the decision region a bigger
                                  penalty, using E, > 1, is applied. This is the situation of the points x2 and x3.
                                    The support vectors are now the vectors that satisfy the condition:
   226   227   228   229   230   231   232   233   234   235   236