Page 401 - Numerical Methods for Chemical Engineering
P. 401
390 8 Bayesian statistics and parameter estimation
Using the likelihood function that follows from the Gauss–Markov conditions and the
assumption of normally-distributed errors, we have
1
−N
p(θ,σ|y) ∝ σ exp − S(θ) p(θ)p(σ) (8.86)
2σ 2
How do we propose priors p(θ) and p(σ)?
Let θ MLE = θ LS be the maximum likelihood estimate of θ, i.e., that maximizing the
likelihood by minimizing S(θ). Let us now write S(θ)as
S(θ) = [S(θ) − S(θ LS )] + S(θ LS ) (8.87)
so that the posterior becomes
1 1
[S(θ) − S(θ LS )] S(θ LS )
−N
p(θ,σ|y) ∝ σ exp − exp − p(θ)p(σ) (8.88)
2σ 2 2σ 2
2
We now define the sample variance s :
1
2
s = S(θ LS ) ν = N − dim(θ) (8.89)
ν
In the supplemental material in the accompanying website, we show that if the Gauss–
2
2
Markov conditions (8.54) hold, s provides an unbiased estimate for σ . That is, if we redo
2
2
2
the set of experiments many times and average the computed s in each, E[s ] = σ .
Using the definition of the sample variance, the posterior becomes
1 νs
1 2 1
−N
p(θ,σ|y) ∝ σ exp − [S(θ) − S(θ LS )] exp − p(θ)p(σ) (8.90)
2σ 2 2σ 2
We now define a likelihood function for σ given s,
2 1
νs
−N
l(σ|s) ∝ σ exp − (8.91)
2σ 2
and a “conditional likelihood” function for θ given y and σ,
1
1
l(θ|y,σ) ∝ exp − [S(θ) − S(θ LS )] (8.92)
2σ 2
The posterior density then partitions into two contributions,
p(θ,σ|y) ∝ [l(θ|y,σ)p(θ)] × [l(σ|s)p(σ)] (8.93)
We now consider each contribution independently to search for priors that seem most
satisfying, in terms of being reproducible by different analysts.
Noninformative prior for θ
We begin first with the “conditional posterior” for θ, treating σ as known,
1
1
p(θ|y,σ) ∝ l(θ|y,σ)p(θ) = exp − [S(θ) − S(θ LS )] p(θ) (8.94)
2σ 2