Page 181 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 181

170                                        SUPERVISED LEARNING

            L should be minimized with respect to w and b, and maximized with
            respect to the Lagrange multipliers   n . Setting the partial derivates of L
            w.r.t. w and b to zero results in the constraints:


                                          N S
                                          X
                                     w ¼       n c n z n
                                          n¼1
                                                                       ð5:55Þ
                                      N S
                                      X
                                         c n   n ¼ 0
                                      n¼1
            Resubstituting this into (5.54) gives the so-called dual form:

                        N S       N S  N S
                        X       1  X X            T
                    L ¼      n          c n c m   n   m z z m ;    n   0  ð5:56Þ
                                                  n
                                2
                        n¼1       n¼1 m¼1
            L should be maximized with respect to the   n . This is a quadratic
            optimization problem, for which standard software packages are avail-
            able. After optimization, the   n are used in (5.55) to find w. In typical
            problems, the solution is sparse, meaning that many of the   n become 0.
            Samples z n for which   n ¼ 0 are not required in the computation of w.
            The remaining samples z n (for which   n > 0) are called support vectors.
              This formulation of the support vector classifier is of limited use: it
            only covers a linear classifier for separable data. To construct nonlinear
            boundaries, discriminant functions, introduced in (5.39), can be applied.
            The data is transformed from the measurement space to a new feature
            space. This can be done efficiently in this case because in formulation
            (5.56) all samples are coupled to other samples by an inner product. For
            instance, when all polynomial terms up to degree 2 are used (as in
            (5.39)), we can write:


                               T          T       2
                           yðz n Þ yðz m Þ¼ ðz z m þ 1Þ ¼ Kðz n ; z m Þ  ð5:57Þ
                                          n
                                                   T
                                                           2
            This can be generalized further: instead of (z z m þ 1) any integer degree
                                                   n
                     d
             T
            (z z m þ 1) with d > 1 can be used. Due to the fact that only the inner
             n
            products between the samples are considered, the very expensive explicit
            expansion is avoided. The resulting decision boundary is a d-th degree
            polynomial in the measurement space. The classifier w cannot easily
            be expressed explicitly (as in (5.55)). However, we are only interested in the
            classification result. And this is in terms of the inner product between the
            object z to be classified and the classifier (compare also with (5.36)):
   176   177   178   179   180   181   182   183   184   185   186