Page 182 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 182

NONPARAMETRIC LEARNING                                       171


                                               N S
                                       T       X
                               gðzÞ¼ w yðzÞ¼      Kðz; z n Þ           ð5:58Þ
                                               n¼1
            Replacing the inner product by a more general kernel function is called
            the kernel trick. Besides polynomial kernels, other kernels have been
                                              2
            proposed. The Gaussian kernel with   I as weighting matrix (the radial
            basis function kernel, RBF kernel) is frequently used in practice:

                                                        2  !
                                               kz n   z m k
                              Kðz n ; z m Þ¼ exp                       ð5:59Þ
                                                     2

            For very small values of  , this kernel gives very detailed boundaries,
            while for high values very smooth boundaries are obtained.
              In order to cope with overlapping classes, the support vector classifier
            can be extended to have some samples erroneously classified. For that,
            the hard constraints (5.53) are replaced by soft constraints:

                               T
                             w z n þ b   1     n  if  c n ¼ 1
                                                                       ð5:60Þ
                              T
                             w z n þ b   1 þ   n  if  c n ¼ 1
            Here so-called slack variables   n   0 are introduced. These should be
                                               2
            minimized in combination with the w . The optimization problem is
            thus changed into:


                               N S
                                      N S
                      1       X       X
                                                 T
                         2
                  L ¼  w þ C       n þ     n ðc n w z n þ b   1 þ   n Þ
                      2
                              n¼1     n¼1
                                                                       ð5:61Þ
                        N S
                        X
                     þ     
 n   n ;    n ;
 n   0
                        n¼1
            The second term expresses our desire to have the slack variables as small
            as possible. C is a trade-off parameter that determines the balance
            between having a large overall margin at the cost of more erroneously
            classified samples, or having a small margin with less erroneously clas-
            sified samples. The last term holds the Lagrange multipliers that are
            needed to assure that   n   0.
              The dual formulation of this problem is the same as (5.56). Its deriv-
            ation is left as an exercise for the reader; see exercise 6. The only
            difference is that an extra upper bound on   n is introduced:   n   C.
   177   178   179   180   181   182   183   184   185   186   187