Page 181 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 181
170 SUPERVISED LEARNING
L should be minimized with respect to w and b, and maximized with
respect to the Lagrange multipliers n . Setting the partial derivates of L
w.r.t. w and b to zero results in the constraints:
N S
X
w ¼ n c n z n
n¼1
ð5:55Þ
N S
X
c n n ¼ 0
n¼1
Resubstituting this into (5.54) gives the so-called dual form:
N S N S N S
X 1 X X T
L ¼ n c n c m n m z z m ; n 0 ð5:56Þ
n
2
n¼1 n¼1 m¼1
L should be maximized with respect to the n . This is a quadratic
optimization problem, for which standard software packages are avail-
able. After optimization, the n are used in (5.55) to find w. In typical
problems, the solution is sparse, meaning that many of the n become 0.
Samples z n for which n ¼ 0 are not required in the computation of w.
The remaining samples z n (for which n > 0) are called support vectors.
This formulation of the support vector classifier is of limited use: it
only covers a linear classifier for separable data. To construct nonlinear
boundaries, discriminant functions, introduced in (5.39), can be applied.
The data is transformed from the measurement space to a new feature
space. This can be done efficiently in this case because in formulation
(5.56) all samples are coupled to other samples by an inner product. For
instance, when all polynomial terms up to degree 2 are used (as in
(5.39)), we can write:
T T 2
yðz n Þ yðz m Þ¼ ðz z m þ 1Þ ¼ Kðz n ; z m Þ ð5:57Þ
n
T
2
This can be generalized further: instead of (z z m þ 1) any integer degree
n
d
T
(z z m þ 1) with d > 1 can be used. Due to the fact that only the inner
n
products between the samples are considered, the very expensive explicit
expansion is avoided. The resulting decision boundary is a d-th degree
polynomial in the measurement space. The classifier w cannot easily
be expressed explicitly (as in (5.55)). However, we are only interested in the
classification result. And this is in terms of the inner product between the
object z to be classified and the classifier (compare also with (5.36)):