Page 399 - Numerical Methods for Chemical Engineering
P. 399
388 8 Bayesian statistics and parameter estimation
The least-squares method reconsidered
Our proposed prior (8.68) makes use of the assumption of prior independence,
p(θ,σ) = p(θ)p(σ) (8.72)
which states that, prior to the experiment, the belief systems about θ and σ are independent.
The posterior density is then
p(θ,σ|y) ∝ l(θ,σ|y)p(θ)p(σ) (8.73)
Assuming the Gauss–Markov conditions hold and that the errors are normally distributed,
the likelihood function is
N
1 −N 1
l(θ,σ|y) = √ σ exp − 2 S(θ) (8.74)
2π 2σ
and thus the posterior is
N
1 −N 1
p(θ,σ|y) ∝ √ σ exp − 2 S(θ) p(θ)p(σ) (8.75)
2π 2σ
If, as we have assumed in (8.68), the prior is uniform in the region of appreciable nonzero
likelihood, p(θ) ∼ c, then the most probable value of θ, for any value of σ, is that which
minimizes S(θ), (8.58). Therefore, the least-squares method is justified statistically, as long
as the Gauss–Markov conditions hold and the errors are normally distributed.
It is shown in the supplemental material in the accompanying website that, in the fre-
quentist view, least squares is an unbiased estimator of the true value (i.e., if we repeat
the set of experiments many times, the average estimate is the true value) if merely the
zero-mean Gauss–Markov condition (8.50) is satisfied.
Numerical treatment of nonlinear least-squares problems
[k]
For a linear model y [k] = x [k] · θ + ε , we obtain the least-squares estimate by solving an
T
T
algebraic system [X X]θ LS = X y. For a nonlinear model
[k]
y [k] = f x ; θ + ε [k] (8.76)
we must find the least-squares estimate through numerical optimization. For notational
convenience, we define the cost function as
N
1 1 [k] [k] 2
F cost (θ) = S(θ) = y − f x ; θ (8.77)
2 2
k=1
The gradient γ = ∆F of this cost function has components
N ) *
∂F cost [k] [k] ∂ f
γ a (θ) = =− y − f x ; θ (8.78)
[k]
∂θ a ∂θ a x ;θ
k=1
We find a local minimum by applying either the nonlinear conjugate gradient method or,
as below, a variation of Newton’s method. For the latter technique, we use the Hessian