Page 70 - Neural Network Modeling and Identification of Dynamical Systems
P. 70

58                2. DYNAMIC NEURAL NETWORKS: STRUCTURES AND TRAINING METHODS

                            We should also mention that in the case of the  where x (p)  ∈ X represent the input vectors and
                         batch or minibatch update strategy, the compu-  ˜ y (p)  ∈ Y represent the observed output vectors.
                         tation of total error function values, as well as  Note that in general the observed outputs ˜y (p)
                         its derivatives, can be efficiently parallelized. In  do not match the true outputs y (p)  = f(x (p) ).We
                         order to do that, we need to divide the data set  assume that the observations are corrupted by
                         into multiple subsets, compute partial sums of  an additive Gaussian noise, i.e.,
                         the error function and its derivatives over the             (p)   (p)  (p)
                         training examples of each subset in parallel, and          ˜ y  = y  + η  ,        (2.57)
                         then sum the results. This is not possible in the   (p)
                                                                      where η   represent the sample points of a zero-
                         case of stochastic updates. In the case of an SGD
                         method, we can parallelize the gradient compu-  mean random vector η ∼ N(0, ) with diagonal
                                                                      covariance matrix
                         tations by neurons of each layer.
                            Finally, we note that any iterative method re-            ⎛ σ 2    0   ⎞
                         quires a stopping criterion used to terminate the            ⎜  1  . .    ⎟
                         procedure. One simple option is a test based on            =  ⎜     .     ⎟ .
                                                                                      ⎝
                                                                                                   ⎠
                         first-order necessary conditions for a local mini-                0     σ  2
                                                                                                 n y
                         mum, i.e.,
                                                                         The approximation is to be performed using a

                                              (k)
                                        ∇E(W )  <ε g .         (2.53)  layered feedforward neural network of the form
                                                                      (2.8). Under the abovementioned assumptions
                         We can also terminate iterations if it seems that  on the observation noise, it is reasonable to uti-
                         no progress is made, i.e.,                   lize a least-squares error function. Thus, we have
                                                                      a total error function E of the form (2.25)with
                                                                                           ¯
                                     (k)
                                E(W ) − E(W   (k+1) )<ε E ,           the individual errors
                                                               (2.54)
                                        (k)   (k+1)
                                      W   − W       <ε w .                (p)     1   (p)   (p)  T   (p)  (p)
                                                                         E  (W) =    ˜ y  −ˆy      ˜y  −ˆy    ,
                                                                                  2
                         In order to prevent an infinite loop in the case                                    (2.58)
                         of algorithm divergence, we might stop when a
                         certain maximum number of iterations has been  where ˆy (p)  represent the neural network out-
                         performed, i.e.,                             puts given the corresponding inputs x (p)  and
                                                                      weights W. The diagonal matrix   of fixed “er-
                                            k   k.             (2.55)  ror weights” has the form
                                                ¯
                                                                                      ⎛            ⎞
                         2.2.2 Static Neural Network Training                          ω 1     0
                                                                                            .
                                                                                      ⎜            ⎟
                                                                                    =  ⎜    . .    ⎟ ,
                            In this subsection, we consider the function              ⎝            ⎠
                         approximation problem. The problem is stated                     0     ω n y
                         as follows. Suppose that we wish to approxi-
                         mate an unknown mapping f: X → Y, where      where ω i are usually taken to be inversely pro-
                         X ⊂ R n x  and Y ⊂ R . Assume we are given an  portional to noise variances.
                                           n y
                         experimental data set of the form               We need to minimize the total approximation
                                                                      error E with respect to the neural network pa-
                                                                            ¯
                                                   P                  rameters W. If activation functions of all the neu-
                                         x (p) , ˜y (p)  ,     (2.56)  rons are smooth, then the error function is also
                                                  p=1
   65   66   67   68   69   70   71   72   73   74   75