Page 229 -
P. 229
5.10 Support Vector Machines 217
Hence, when maximizing the separation margin r, we are in fact minimizing the
Vapnik-Chervonenkis dimension and, therefore, minimizing the structural risk. As
can be seen from expression (5-63), this is particularly beneficial for a small value
of nldyc (say below 20). Also note that the complexity of the SVM approach
depends on the number of support vectors used, and is independent of the
problem's dimensionality (for more details see e.g. Cherkassky and Mulier, 1998).
For the canonical hyperplane the norm of the weight vector controls the margin
of separation. The SVM approach of maximizing the margin of separation
(equivalent to minimizing the norm of the weight vector) can be expressed then as
the following constrained optimisation problem:
minimize (D(w)= +IIwII 2 ,
subjectto t,(w'xi+wo)21, i=l;..,n
Notice that the minimization of the cost function O(w) will assure the
minimization of the norm of the weight vector, with the advantage that @(w) being
a quadratic function, there is a standard method for solving this so-called quadratic
programming problem. This standard method is called the Lagrange multipIiers
method, and consists of computing the saddle point of the Lagrangian function:
minimizing the weights plus bias and maximizing the nonnegative Lagrange
multipliers ai.
If we differentiate the Lagrangian function with respect to w and wo, the
following optimality conditions are derived:
In order to compute the weights all we need now are the values of the
Lagrangian multipliers. When we expand the expression of J(w.wo,a) and use the
results (5-91), an expression depending only on a is obtained:
The extremal problems (5-89) and (5-93) are known as primal and dual
problems, respectively. The dual problem consists of maximizing Q(a), yielding
nonnegative Lagrange multipliers that satisfy condition (5-92). Notice that the