Page 400 - Numerical Methods for Chemical Engineering
P. 400
Selecting a prior for single-response data 389
2
H(θ) =∇ F cost with the elements
θ
N ) *) * N ) 2 *
∂ f ∂ f [k] [k] ∂ f
H ab (θ) = − y − f x ; θ
∂θ a x ;θ ∂θ b x ;θ ∂θ a ∂θ b x ;θ
[k]
[k]
[k]
k=1 k=1
(8.79)
We note again that convergence of Newton’s method does not require the use of this
exact Hessian, but convergence to a minimum does require the approximate Hessian to
be positive-definite at each iteration. If we define the linearized design matrix with the
elements
∂ f
X ka (θ) = (8.80)
[k]
∂θ a x ;θ
that agrees with our previous definition in the special case of a linear model, the Hessian
then has elements
) *
N 2
T [k] [k]
∂ f
H ab (θ) = X X| θ − y − f x ; θ (8.81)
ab
[k]
k=1 ∂θ a ∂θ b x ;θ
If we approximate the Hessian by retaining only the first contribution, we have an approxi-
mation that is always at least positive-semidefinite,
X X ≈ H(θ) (8.82)
T
θ
The gradient components also are expressed simply in terms of X| θ ,
N
[k]
γ a (θ) =− y [k] − f x ; θ ( X ka | ) (8.83)
θ
k=1
As in the linear least-squares method, it is possible that (8.82) may have eigenvalues near or
equal to zero that make the Newton update system ill-conditioned. This may be corrected by
adding a small positive scalar along the diagonal. Newton iteration using this modification
is the most common approach to nonlinear least squares, and is known as the Levenberg–
[0]
Marquardt method. Starting from an initial guess θ , the Newton update at iteration m
is
T
θ [m+1] = θ [m] + α [m] [m] X X| θ [m] + τ [m] I p [m] =−γ θ [m] (8.84)
p
α [m] is obtained from a weak line search. For a linear model, this method converges after a
single iteration. When the approximate Hessian is (nearly) singular, the trust-region Newton
method is preferred over the line-search algorithm shown here.
Selecting a prior for single-response data
We now return to the question of proposing a prior, based first upon the assumption of prior
independence, p(θ,σ) = p(θ)p(σ), such that
p(θ,σ|y) ∝ l(θ,σ|y)p(θ)p(σ) (8.85)