Page 225 -
P. 225

5.9 Radial Basis Functions   213


                               The formula (5-81) can be represented compactly in matrix form:



                               For a large class of functions, and assuming that the training set is composed of
                             distinct  points,  matrix  Q,  is  non-singular  and  the  weights  needed  for  an  exact
                             interpolation can be computed from:





                               The most common kernel function used is the Gaussian function:







                             with  aacting as smoothing parameter.
                               As  we  have  already  seen in  previous sections we  are not  interested in  exact
                             interpolation,  but  rather  on  an  interpolation solution  capable  of  generalization,
                             therefore,  some  modifications have  to  be  introduced in  the  exact  interpolation
                             method:
                             - The number of radial basis functions is typically much smaller than n, since they
                               are  chosen  relative  to  some  centroid  patterns,  m,, instead  of  relative  to  the
                               training patterns.
                             - In  order to  obtain good generalization properties the centroids will  have to be
                                adjusted as part of a training process.
                              - Instead of having a common smooth parameter a, each basis function can have
                                its own smoothing parameter @,  also determined during the network training.
                              - Bias parameters are included in the summation of the kernel values in order to
                                compensate  for  the  difference  between  the  average  value  over  the  basis
                                functions and the average value of the targets.

                                The radial basis function (RBF) network implements these requisites with  the
                              architecture of Figure 5.42.
                                The weights of the first layer (radial layer) of  an  RBF neural net are used  to
                              adjust the  centroids  mj and  smoothing factors  q used  by  the kernel  functions.
                              Besides  a  number  of  ad-hoc  methods  to  choose  the  centroids  (for  instance,
                              randomly selected or equally separated in the whole range of the training samples),
                              the  centroids can  be determined sensibly using  the k-means clustering algorithm
                              explained in section 3.5. Next, the smoothing parameters are chosen, for instance,
                              by averaging the distance from a centroid to its k-nearest neighbours. In this way,
                              the  smoothing effect is smaller in  regions where  the pattern  distribution is more
                              peaked.
   220   221   222   223   224   225   226   227   228   229   230