Page 397 - Numerical Methods for Chemical Engineering

P. 397

386 8 Bayesian statistics and parameter estimation

of p(y|θ,σ) and the prior p(θ,σ):

p(θ,σ|y) ∝ p(y|θ,σ)p(θ,σ) (8.63)
Note that p(y|θ,σ) is the probability of observing a particular response y, given speciﬁed
values of θ and σ. But, at the time of analysis, it is y that is known and θ and σ that are
unknown. It is common practice to switch the order of the arguments and so deﬁne the
likelihood function for θ and σ, given speciﬁed y,
1 −N 1
N
l(θ,σ|y) ≡ p(y|θ,σ) = √ σ exp − 2 S(θ) (8.64)
2π 2σ
While we switch the order of the arguments, the meaning of this quantity remains
unchanged – it is the probability that the measured response is y, given speciﬁed values of
θ and σ.
The posterior density is then written as
p(θ,σ|y) ∝ l(θ,σ|y)p(θ,σ) (8.65)

In the Bayesian approach, we take as estimates θ M ,σ M the values that maximize the pos-
terior density p(θ,σ|y). The frequentist rule is to take the value θ MLE that maximizes the
likelihood function l(θ,σ|y). If the prior p(θ,σ) is not uniform in θ, the Bayesian most
probable estimate θ M and the maximum likelihood estimate θ MLE disagree.

Some general considerations about the selection of a prior
Clearly, the choice of the prior is crucial, as it inﬂuences our analysis. It is this subjective
nature of the Bayesian approach that is the cause of controversy, since in statistics we would
like to think that different people who look at the data will come to the same conclusions.
We hope that the data will be sufﬁciently informative that the likelihood function is sharply
peaked around speciﬁc values of θ and σ; i.e., that the inference problem is data-dominated.
In this case, the estimates are rather insensitive to the choice of prior, as long as it is non-
zero near the peak of the likelihood function. When this is not the case, the prior inﬂuences
the results of the analysis.
In some problems, the explicit dependence upon a prior is quite useful. If we know a
priori that certain regions of the parameter space are inadmissible (e.g. certain parameters
must be nonnegative), then the prior can be set to zero in those regions to exclude them
from consideration.
Using an explicit prior also allows us to blend learning from different sets of data in
a seamless manner. Let us say that we measure a response vector y 1 in some set of
experiments, and compute the posterior density

p θ|y 1 ∝ l 1 θ|y 1 p(θ) (8.66)
We later perform a new set of experiments on the same system and measure a response
vector y . When we perform the analysis on these new data, an apparent choice of prior
2
for the second set of experiments is the posterior density from the ﬁrst. The posterior density

392 393 394 395 396 397 398 399 400 401 402