Page 197 -
P. 197

5.6  Performance of  Neural Networks   185


                                  Stock Exchange data the standard deviation of the SONAE shares values is 2070
                                  Escudos, whereas for the prediction errors, it is only 224 Escudos.
                                    As a matter of fact, a good ranking index for neural networks comparison is:






                                    This is a normalized index, with  value  in  the  [0, 11  interval for all  networks,
                                  therefore affording more insight when comparing networks. For instance, for the
                                  previous Stock Exchange figures, s,,,  = 0.1 1 represents a good regression solution.
                                  Another possibility for comparison of  NN  solutions is the use of  the ROC curve
                                  area method, described in section 4.3.3.
                                    Estimation of  confidence intervals for neural  network errors can be done in  a
                                  "model-free"  approach,  as  indicated  in  section  4.5.  As  seen  in  that  section,
                                  confidence intervals obtained by  this "model-free" approach can be  unrealistically
                                  large. More realistic confidence intervals are harder to compute. The respective
                                  formulas were derived by Chryssolouris et al. (1996) using the so-called Jacobian
                                  matrix  of  the neural network, a matrix whose elements are the derivatives of  the
                                  network outputs with respect to the inputs.
                                    It  is interesting to see the implications of  using the squared error criterion for
                                  MLP  training.  For  this  purpose  let  us  imagine  that  we  have  obtained  output
                                  functions zk,  modelling each class  of  the  input  data,  such  that  the target  values
                                  differ from zk  by an error ek:




                                    For  a given training set X with  n patterns and target  set  T,  assuming that  the
                                  distributions  of  the  target  values  conditioned  by  the  patterns  p(tk(xi)lxi) are
                                  independent, we can compute the likelihood of  the dataset in a similar way as we
                                  did in (4-22):






                                    Assuming Gaussian errors with zero mean and equal variance o, and since the zk
                                  are deterministic, the logarithm of the likelihood is:






                                  where E is the error energy expressed as in (5-2a).
                                    We conclude that minimizing the squared error is equivalent to finding out the
                                  output functions that maximize the likelihood of the training set. By  analysing the
   192   193   194   195   196   197   198   199   200   201   202