Page 228 -
P. 228

216    5 Neural Networks


                           determining  the  hyperplane  that  maximizes  the  margin  of  separation,  called
                           optimal hyperplane.
                             Imagine  that  we  had  found  the  optimal  hyperplane,  i.e.,  the  root  set  of
                           w'x+wo=O.  Concerning  the  root  set,  the  values  of  w  and  wo are  obviously  not
                           unique,  and  we  may  always divide all  the weights plus  bias by  the  same scalar
                           factor without changing the hyperplane. Let us assume then that we have scaled w
                           and  wo in  such a way  that  the minimum distance of  a point to  the  hyperplane is
                           11IIwI1, i.e.,





                             A hyperplane satisfying this condition is called a canonical hyperplane and the
                           vectors  xi corresponding  to  this  minimum  distance  are  called  support  vectors.
                           Condition (5-86) can also be written as follows:

                              ti (w'  xi + wo )= 1  if  and only if  xi is a support vector .   (5-87)

























                           Figure  5.44.  Optimal  linear discriminant, with  margin  of  separation  24MI  and
                           support vectors (grey coloured circles) from both classes.



                              It can be shown (Vapnik, 1998) that the Vapnik-Chernovenkis dimension in ad-
                           dimensional space, for a machine using  a canonical hyperplane with  a margin of
                           separation r, and sample feature vectors within a sphere of radius R, is bounded as
                           follows:
   223   224   225   226   227   228   229   230   231   232   233