Page 225 -

P. 225

5.9 Radial Basis Functions 213

The formula (5-81) can be represented compactly in matrix form:

For a large class of functions, and assuming that the training set is composed of
distinct points, matrix Q, is non-singular and the weights needed for an exact
interpolation can be computed from:

The most common kernel function used is the Gaussian function:

with aacting as smoothing parameter.
As we have already seen in previous sections we are not interested in exact
interpolation, but rather on an interpolation solution capable of generalization,
therefore, some modifications have to be introduced in the exact interpolation
method:
- The number of radial basis functions is typically much smaller than n, since they
are chosen relative to some centroid patterns, m,, instead of relative to the
training patterns.
- In order to obtain good generalization properties the centroids will have to be
adjusted as part of a training process.
- Instead of having a common smooth parameter a, each basis function can have
its own smoothing parameter @, also determined during the network training.
- Bias parameters are included in the summation of the kernel values in order to
compensate for the difference between the average value over the basis
functions and the average value of the targets.

The radial basis function (RBF) network implements these requisites with the
architecture of Figure 5.42.
The weights of the first layer (radial layer) of an RBF neural net are used to
adjust the centroids mj and smoothing factors q used by the kernel functions.
Besides a number of ad-hoc methods to choose the centroids (for instance,
randomly selected or equally separated in the whole range of the training samples),
the centroids can be determined sensibly using the k-means clustering algorithm
explained in section 3.5. Next, the smoothing parameters are chosen, for instance,
by averaging the distance from a centroid to its k-nearest neighbours. In this way,
the smoothing effect is smaller in regions where the pattern distribution is more
peaked.

220 221 222 223 224 225 226 227 228 229 230