Page 211 -
P. 211

5.6 Performance of Neural Networks   199

                                    A  lower  bound  on  the  number  of  samples  needed  for  learning  a  class  of
                                  functions mapping Xinto [0,1], with error Pe and confidence level 1-a, a<0.01, is
                                  (Anthony and Bartlett, 1999):

                                          fat(^, pelq)-1
                                     n, 2              ,  for any 0 < q c !A,
                                              1%


                                  where q is the quantization level of the function values, i.e., they are computed as
                                  qxtrunc(f(xi)lq).
                                    If we use formula (5-57) in (5-58) we obtain the estimated lower bound:




                                    Let us apply this lower bound to the MLP3:3:1 regressing the temperature T for
                                  the  Weather  dataset,  with  Pe - 0.08,  as  analysed  in  section  5.5.3.  The  total
                                  variation of T(x) is  1573.9 "C, which we refer to the [0,1] interval by  dividing by
                                  the  max(T(x))-min(T(x))  range,  obtaining  59.4.  Applying  formula  (5-58a)  the
                                  estimated lower bound is 23, which is largely exceeded by the number of training
                                  samples (378).


                                  5.6.5  Risk Minimization

                                  In  the preceding section we saw how  to characterize the complexity of  a neural
                                  network  using  the  Vapnik-Chervonenkis and  the  fat-shattering  dimensions,  for
                                  classification  and regression problems, respectively. We will now  proceed  to see
                                  how  the  complexity  influences the  classification or  regression  risk  of  a  neural
                                  network. For this purpose, let us denote R(w) the risk incurred when using a neural
                                  network with weight vector w:






                                    Notice that this is the same formula as (4-20a), section 4.2.1,  now applied to a
                                   continuous target  space T.  As  we  have done in  that  section we  will  discuss the
                                   simplified  situation,  without  impairing  the  main  conclusions,  of  zero  loss  for
                                   correct classifications and equal loss for wrong classifications. R(w) will then be,
                                   simply, the classifier true error Pe(w). As usual, we do not know the true error but
                                   are able to obtain the design set error estimate,  Fed (w ) , by  counting the number
                                   of individual errors. An important result due to Vapnik (see Vapnik, 1998) relates
                                   these  two  errors,  for  any  weight  vector  w  and  arbitrarily  small  quantity  E,  as
                                   follows:

                                                               with
   206   207   208   209   210   211   212   213   214   215   216