Page 400 - Numerical Methods for Chemical Engineering
P. 400

Selecting a prior for single-response data                          389



                           2
                  H(θ) =∇ F cost with the elements
                          θ
                             N  )       *)         *    N                )   2       *

                                  ∂ f       ∂ f             [k]     [k]      ∂ f
                    H ab (θ) =                       −     y  − f x ; θ

                                 ∂θ a x ;θ  ∂θ b x ;θ                      ∂θ a ∂θ b x ;θ
                                                                                  [k]
                                                [k]
                                      [k]
                            k=1                        k=1
                                                                                     (8.79)
                  We note again that convergence of Newton’s method does not require the use of this
                  exact Hessian, but convergence to a minimum does require the approximate Hessian to
                  be positive-definite at each iteration. If we define the linearized design matrix with the
                  elements

                                                       ∂ f
                                              X ka (θ) =                             (8.80)

                                                           [k]
                                                       ∂θ a x ;θ
                  that agrees with our previous definition in the special case of a linear model, the Hessian
                  then has elements
                                                              )           *
                                            N                     2
                                   T             [k]    [k]
                                            	                    ∂ f
                        H ab (θ) = X X| θ  −    y  − f x ; θ                         (8.81)
                                        ab
                                                                       [k]
                                            k=1                 ∂θ a ∂θ b x ;θ
                  If we approximate the Hessian by retaining only the first contribution, we have an approxi-
                  mation that is always at least positive-semidefinite,

                                                X X   ≈ H(θ)                         (8.82)
                                                 T
                                                    θ
                  The gradient components also are expressed simply in terms of X| θ ,
                                               N
                                                           [k]


                                     γ a (θ) =−  	   y  [k]  − f x ; θ ( X ka | )    (8.83)
                                                                     θ
                                              k=1
                  As in the linear least-squares method, it is possible that (8.82) may have eigenvalues near or
                  equal to zero that make the Newton update system ill-conditioned. This may be corrected by
                  adding a small positive scalar along the diagonal. Newton iteration using this modification
                  is the most common approach to nonlinear least squares, and is known as the Levenberg–
                                                            [0]
                  Marquardt method. Starting from an initial guess θ , the Newton update at iteration m
                  is
                                                T

                        θ [m+1]  = θ [m]  + α [m] [m]    X X| θ [m] + τ  [m] I p [m]  =−γ θ [m]     (8.84)

                                         p
                  α [m]  is obtained from a weak line search. For a linear model, this method converges after a
                  single iteration. When the approximate Hessian is (nearly) singular, the trust-region Newton
                  method is preferred over the line-search algorithm shown here.
                  Selecting a prior for single-response data

                  We now return to the question of proposing a prior, based first upon the assumption of prior
                  independence, p(θ,σ) = p(θ)p(σ), such that

                                         p(θ,σ|y) ∝ l(θ,σ|y)p(θ)p(σ)                 (8.85)
   395   396   397   398   399   400   401   402   403   404   405