Page 71 - Neural Network Modeling and Identification of Dynamical Systems
P. 71

2.2 ARTIFICIAL NEURAL NETWORK TRAINING METHODS               59
                          smooth. Hence, the minimization can be carried  The automatic differentiation technique [64]
                          out using any of the optimization methods de-  computes function derivatives at a point by ap-
                          scribed in Section 2.2.1. However, in order to  plying the chain rule to the corresponding nu-
                          apply those methods, we need an efficient al-  merical values instead of symbolic expressions.
                          gorithm to compute the gradient and Hessian  This method produces accurate derivative val-
                          of the error function with respect to the param-  ues, just like the symbolic differentiation, and
                          eters. As mentioned above, the total error gra-  also allows for a certain performance optimiza-
                                 ¯
                                                 2 ¯
                          dient ∇E and Hessian ∇ E may be expressed    tion. Note that automatic differentiation relies
                          in terms of the individual error gradients ∇E (p)  on the original computational graph for the
                                       2
                          and Hessians ∇ E (p) . Thus, all that remains is to  function to be differentiated. Thus, if the original
                          compute the derivatives of E (p) . For notational  graph makes use of some common intermedi-
                          convenience, in the remainder of this section we  ate values, they will be efficiently reused by the
                          omit the training example index p.           differentiation procedure. Automatic differen-
                                                                       tiation is especially useful for neural network
                            There exist several approaches to computa-
                                                                       training, since it scales well to multiple param-
                          tion of error function derivatives:
                                                                       eters as well as higher-order derivatives. In this
                          • numeric differentiation;                   book, we adopt the automatic differentiation ap-
                          • symbolic differentiation;                  proach.
                          • automatic (or algorithmic) differentiation.  Automatic differentiation encompasses two
                                                                       different modes of computation: forward and re-
                            The numeric differentiation approach relies on
                                                                       verse. Forward mode computes sensitivities of all
                          the derivative definition and approximates it via
                                                                       variables with respect to input variables: it starts
                          finite differences. This method is very simple to
                                                                       with the intermediate variables that explicitly
                          implement, but it suffers from truncation and  depend on the input variables (most deeply
                          roundoff errors. It is especially inaccurate for  nested subexpressions) and proceeds “forward”
                          higher-order derivatives. Also, it requires many  by applying the chain rule, until the output vari-
                          function evaluations: for example, in order to es-  ables are processed. Reverse mode computes sen-
                          timate the error function gradient with respect to  sitivities of output variables with respect to all
                          n w parameters using the simplest forward differ-  variables: it starts with the intermediate vari-
                          ence scheme we require error function values at  ables on which the output variables explicitly
                          n w + 1 points.
                                                                       depend (outermost subexpressions) and pro-
                            Symbolic differentiation transforms a symbolic  ceeds “in reverse” by applying the chain rule,
                          expression for the original function (usually rep-  until the input variables are processed. Each
                          resented in the form of a computational graph)  mode has its own advantages and disadvan-
                          into symbolic expressions for its derivatives by  tages. The forward mode allows to compute
                          applying a chain rule. The resulting expressions  function values as well as its derivatives of mul-
                          may be evaluated at any point accurately to  tiple orders in a single pass. On the other hand,
                          working precision. However, these expressions  in order to compute the rth-order derivative us-
                          usually end up to have many identical subex-  ing the reverse mode, one needs the derivatives
                          pressions, which leads to duplicate computa-  of all the lower orders s = 0,...,r − 1 before-
                          tions (especially in the case we need the deriva-  hand. Computational complexity of first-order
                          tives with respect to multiple parameters). In or-  derivatives computation in the forward mode is
                          der to avoid this, we need to simplify the expres-  proportional to the number of inputs, while in
                          sions for derivatives, which presents a nontrivial  the reverse mode it is proportional to the num-
                          problem.                                     ber of outputs. In our case, there is only one out-
   66   67   68   69   70   71   72   73   74   75   76