Page 231 -
P. 231
5.10 Support Vector Machines 219
which has the following solution in nonnegative ds: a,=@ a2=l; a3=314; &=114.
Applying (5-91), we determine the optimal weight vector:
Hence, the linear discriminant is a straight line at 45" and the support vectors
are the points x2, x3 and x4 (with non-zero Lagrange multipliers), allowing us to
determine the optimal bias using points x2 and x3:
The canonical hyperplane is, therefore, d(x) = 3 - 2x1 - 2x2 = 0, satisfying
condition (5-87) for the support vectors.
Concerning the performance of SVM classifiers in the case of linearly separable
classes, the work of Raudy (1997) has shown that the classification error is mainly
determined by the dimensionality ratio, and that larger separation margins result,
on average, in better generalization.
When the classes are non-separable, the optimal hyperplane must take into
account the deviations from the ideal separable situation. In the approach
introduced by Cortes and Vapnik (1995), the conditions (5-89) for the
determination of the optimal hyperplane are reformulated as:
2 "
minimize @(w)= +11wll + CZ E~ ,
i=l
subjectto t,(w'x,+w,)~l-~~, i=l,..-,n ,
where the g are nonnegative slack variables, penalizing the deviation of a data
point from the ideal separable situation.
For a point falling on the right side of the decision region but inside the region
of separation, the value of is smaller than one. This is the situation of point xl in
Figure 5.46. For a point falling on the wrong side of the decision region a bigger
penalty, using E, > 1, is applied. This is the situation of the points x2 and x3.
The support vectors are now the vectors that satisfy the condition: