Page 182 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 182
NONPARAMETRIC LEARNING 171
N S
T X
gðzÞ¼ w yðzÞ¼ Kðz; z n Þ ð5:58Þ
n¼1
Replacing the inner product by a more general kernel function is called
the kernel trick. Besides polynomial kernels, other kernels have been
2
proposed. The Gaussian kernel with I as weighting matrix (the radial
basis function kernel, RBF kernel) is frequently used in practice:
2 !
kz n z m k
Kðz n ; z m Þ¼ exp ð5:59Þ
2
For very small values of , this kernel gives very detailed boundaries,
while for high values very smooth boundaries are obtained.
In order to cope with overlapping classes, the support vector classifier
can be extended to have some samples erroneously classified. For that,
the hard constraints (5.53) are replaced by soft constraints:
T
w z n þ b 1 n if c n ¼ 1
ð5:60Þ
T
w z n þ b 1 þ n if c n ¼ 1
Here so-called slack variables n 0 are introduced. These should be
2
minimized in combination with the w . The optimization problem is
thus changed into:
N S
N S
1 X X
T
2
L ¼ w þ C n þ n ðc n w z n þ b 1 þ n Þ
2
n¼1 n¼1
ð5:61Þ
N S
X
þ
n n ; n ;
n 0
n¼1
The second term expresses our desire to have the slack variables as small
as possible. C is a trade-off parameter that determines the balance
between having a large overall margin at the cost of more erroneously
classified samples, or having a small margin with less erroneously clas-
sified samples. The last term holds the Lagrange multipliers that are
needed to assure that n 0.
The dual formulation of this problem is the same as (5.56). Its deriv-
ation is left as an exercise for the reader; see exercise 6. The only
difference is that an extra upper bound on n is introduced: n C.