Page 187 -
P. 187

5.5 Multi-Layer Perceptrons   175








                                  This momentum term, with momentum factor  a: tends to speed up the network
                                convergence, while at the same time avoiding oscillations. It acts in the same way
                                as  the  mass  of  a particle falling on a  surface in  a viscous medium: away from a
                                minimum  the  mass  of  the  particle  increases  the  speed  along  its  downward
                                trajectory;  near  the  minimum  it dampens  the oscillations around  it.  Similarly the
                                momentum  term  increases  the  learning  rate  in  regions  of  low  curvature  and
                                decreases  it  in  high  curvature  regions,  therefore  reducing  oscillations  in  these
                                 regions (for details see Qiang, 1999).
                                  The previous  weight  updating  formulas assume a  pattern-by-pattern  operation
                                 mode. Usually  it  is more efficient to  compute  the errors  for  all  the patterns  and
                                 update  the  weights  using  formulas  with  these  total  errors.  This  is  the  so-called
                                 batch  training,  already  mentioned  in  section  5.1.  An  iteration  using  all  of  the
                                 available  data is called  an epoch, and the training  is conducted  by  repeating  the
                                 weight updating process in a sufficiently large number of epochs.


                                 5.5.2  Practical aspects

                                 When  training  multi-layer  perceptrons,  and  other  types  of  neural  nets  as  well,
                                 several practical aspects must be taken into account; these, are described next.

                                 Feature and architecture selection

                                 When  designing a neural  net, one usually  has to perform  feature selection in the
                                 same  way  as when  designing  statistical  classifiers.  However,  the classical  search
                                 methods are more difficult or cumbersome to apply in the case of neural nets for
                                 two  reasons: for a given architecture, any configuration of features at the network
                                 inputs demands a lengthy training process; for a given configuration of  features at
                                 the network  inputs,  the  performance  of  the network  depends  on the  architecture
                                 used.  Therefore,  feature  set  and  architecture  work  together  in  a  coupled  way.
                                 Concerning the first issue, we will later present a feature selection method based on
                                 genetic  algorithms,  which  is  quite  fast  and  often  produces  quite  good  results.
                                 Regarding the second issue, one may implement searching schemes for the "best"
                                 solution in a domain of interesting architectures. This is the approach implemented
                                 in  Statistica  under  the  name  of  Intelligent Problem  Solver (IPS):  once we have
                                 specified the type of  network,  the  range  of  features  and some constraints  on the
                                 architecture such as the number of hidden nodes, the IPS will automatically search
                                 for the "best" solutions.
   182   183   184   185   186   187   188   189   190   191   192