Page 180 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 180
NONPARAMETRIC LEARNING 169
infinite set of solutions. In contrast, the support vector classifier chooses
one particular solution: the classifier which separates the classes with
maximal margin. The margin is defined as the width of the largest ‘tube’
not containing samples that can be drawn around the decision boundary;
see Figure 5.10. It can be proven that this particular solution has the
highest generalization ability.
Mathematically, this can be expressed as follows. Assume we have
training samples z n , n ¼ 1,::, N S (not augmented with an extra element)
and for each sample a label c n 2f1, 1g, indicating to which of the two
T
classes the sample belongs. Then a linear classifier g(z) ¼ w z þ b is
sought, such that:
T
w z n þ b 1 if c n ¼þ1
for all n ð5:52Þ
T
w z n þ b 1 if c n ¼ 1
These two constraints can be rewritten into one inequality:
T
c n ðw z n þ bÞ 1 ð5:53Þ
The gradient vector of g(z)is w. Therefore, the square of the margin is
2
T
inversely proportional to w ¼ w w. To maximize the margin, we
kk
2
have to minimize w . Using Lagrange multipliers, we can incorporate
kk
the constraints (5.53) into the minimization:
N S
1 2 X
T
w
L ¼ kk þ n c n w z n þ b 1 ; n 0 ð5:54Þ
2
n¼1
T
T
T
w z+b =–1 w z+b =0 w z+b =+1
c = –1 c =1
n
n
support vectors
margin
Figure 5.10 The linear support vector classifier