Page 228 -

P. 228

216 5 Neural Networks

determining the hyperplane that maximizes the margin of separation, called
optimal hyperplane.
Imagine that we had found the optimal hyperplane, i.e., the root set of
w'x+wo=O. Concerning the root set, the values of w and wo are obviously not
unique, and we may always divide all the weights plus bias by the same scalar
factor without changing the hyperplane. Let us assume then that we have scaled w
and wo in such a way that the minimum distance of a point to the hyperplane is
11IIwI1, i.e.,

A hyperplane satisfying this condition is called a canonical hyperplane and the
vectors xi corresponding to this minimum distance are called support vectors.
Condition (5-86) can also be written as follows:

ti (w' xi + wo )= 1 if and only if xi is a support vector . (5-87)

Figure 5.44. Optimal linear discriminant, with margin of separation 24MI and
support vectors (grey coloured circles) from both classes.

It can be shown (Vapnik, 1998) that the Vapnik-Chernovenkis dimension in ad-
dimensional space, for a machine using a canonical hyperplane with a margin of
separation r, and sample feature vectors within a sphere of radius R, is bounded as
follows:

223 224 225 226 227 228 229 230 231 232 233