Page 198 - Neural Network Modeling and Identification of Dynamical Systems
P. 198

5.4 HOMOTOPY CONTINUATION TRAINING METHOD FOR SEMIEMPIRICAL ANN-BASED MODELS  189
                          It is due to these additional parameters a that the  thesolutionofthissystem w should be ad-
                                                                                                  ∗
                          Jacobian of H will have full rank for all τ ∈[0,1).  ditionally verified: for example, if the Hessian
                                                                        2
                            The following theorem [31] offers theoretical  ∂ ¯ E(w )  has full rank and all of its eigenvalues
                                                                            ∗
                                                                           2
                                                                         ∂w
                          justification for the probability-one homotopy  are positive, then the solution is a local mini-
                          methods.                                     mum. Also note that we have two possibilities:
                          Theorem 6. Let H: R n w  ×[0,1) × R n w  → R n w  either we transform an optimization problem to
                                2
                          be a C -smooth vector-valued function, and let  a system of equations and then construct a ho-
                          H a :[0,1) × R n w  → R n w  be a vector-valued func-  motopy for this system, or we construct a homo-
                          tion which satisfies H a (τ,w) ≡ H(a,τ,w). Assume  topy for the error function and then differentiate
                          that the zero vector 0 ∈ R n w  is a regular value of H.  it to obtain a homotopy for the equation system.
                          Finally, assume that for each value of additional pa-  The homotopy continuation approach has
                          rameters a ∈ R n w  the equation system H a (0,w) = 0  been previously applied to a feedforward neu-
                          has a unique solution ˜ w. Then, for almost all a ∈ R n w  ral network training problem. Some authors [32,
                                       2
                          there exists a C -smooth curve γ ⊂[0,1) × R ,  33] have applied the convex homotopy (5.43)as
                                                                  n w
                          emanating from (0, ˜ w) and satisfying H a (τ,w) = 0  well as the so-called “global” homotopy of the
                          ∀(τ,w) ∈ γ . Also, if the curve γ is bounded, then it  form
                                                                  n w
                          has an accumulation point (1, ¯ w) for some ¯ w ∈ R .
                          Moreover, if the Jacobian of H a at the point (1, ¯ w)  H(a,τ,w) = F(w) − (1 − τ)F(a)  (5.44)
                          has full rank, then the curve γ has a finite arc length.
                                                                       to the sum of the squared errors objective func-
                            Since the zero vector is a regular value, the  tion. Gorse, Shepherd, and Taylor [34] have sug-
                          Jacobian of H has full rank at all points of
                                                                       gested a homotopy that scales the training set
                          the curve γ , therefore this curve has no self-
                                                                       target outputs from their mean value at τ = 0
                          intersections or intersections with the other so-
                                                                       to the original value at τ = 1.Coetzee [35]has
                          lution curves of H a . Also, since the equation  proposed a “natural” homotopy that transforms
                          system H a (0,w) = 0 has a unique solution, the  neuron activation functions from linear to non-
                          curve γ cannot return to cross the hyperplane  linear ones (ϕ(τ,n) = (1 − τ)n + τ thn), thereby
                          τ = 0. The convex homotopy (5.43) satisfies all  deforming the problem from linear to nonlin-
                                                            2
                          the conditions of Theorem 6 for any C -smooth
                                                                       ear regression. Coetzee has also suggested the
                          function F with ˜ w ≡ a. In order to guarantee
                                                                       use of regularization in order to keep the solu-
                          the boundedness of γ , we may require that the  tion curve γ bounded. Authors of [32,35] have
                          equation system H a (τ,w) = 0 does not have so-  also studied the homotopy continuation meth-
                          lutions at infinity. This can be achieved by means
                                                                       ods which allow for a search of multiple solu-
                          of regularization.
                                                                       tions to the problem. However, in this book we
                            Although this method is designed for solving
                                                                       are only concerned with a single solution search.
                          systems of nonlinear equations, it can be applied
                          to optimization problems as well. In order to do  However, these homotopies are less efficient
                          that, we replace the error function minimization  for a recurrent neural network training problem
                          problem E(w) → min with aproblemoffind-       because the individual trajectory error function
                                   ¯
                                           w                           (5.8) sensitivity to parameters w grows exponen-
                          ingastationarypoint  ∂ ¯ E(w)  = 0, i.e., with a sys-  tially over time. Thus, even for a moderate pre-
                                               ∂w
                          tem of nonlinear equations. We should mention  diction horizon ¯ t, the error function landscape
                          that these equations represent only the neces-  becomes quite complicated. For instance, if we
                          sary conditions for a local minimum of the error  utilize a convex homotopy (5.43)) and fail to
                          function. These conditions are not sufficient, un-  choose a good initial guess w (0)  for parameters,
                          less the error function is pseudoconvex. Hence,  then the error function growth will be very rapid
   193   194   195   196   197   198   199   200   201   202   203