Page 439 - Numerical Methods for Chemical Engineering
P. 439
428 8 Bayesian statistics and parameter estimation
Model criticism and selection
Similar techniques are used to judge which of several models best fits the data. Let us
propose several models α = 1, 2,... of which one is believed to be the “true” model. Let
M α be the event that model α is the true one, and let Y be the event that we observe the
particular data set Y. Then, Bayes’ theorem applied to the joint probability P(M α ∩ Y)
yields the posterior probability that model α is the true one, given the data Y:
P(Y|M α )P(M α )
(8.213)
P(Y|M α )P(M α )
P(M α |Y) =
α
To remove the dependence on the priors P(M α ), we compare models α and β by computing
the Bayes factor
P(M α |Y)/P(M α ) P(Y|M α )
B αβ (Y) = = (8.214)
P(M β |Y)/P(M β ) P(Y|M β )
These probabilities P(Y|M α ) are the priors of the data set, under model α:
' '
α
P(Y|M α ) = p α Y θ , α p α θ α , α d α dθ α (8.215)
P α α >0
θ α ∈ P α is the vector of parameters for model α and α is the covariance matrix of
the random error. The prior is p α (θ α , α ) and the likelihood is p α (Y|θ α , α ).
The integrals (8.215) can be computed by MCMC simulation. We generate a sequence
(θ α ,m , α ,m ) at random from a sampling density π S (θ α , α ). For a number N s of
samples, we approximate P(Y|M α )as
α , m α , m
p α θ ,
α , m
α ,m
p α Y θ ,
N s
m=1 π s θ α , m , α , m
P(Y|M α ) ≈ (8.216)
N s p α θ α ,m , α ,m
α ,m α ,m
m=1 π s θ ,
A common choice of π S is the integrand of (8.215). This approach is quite general, but
the sampling is carried out more easily for single-response data, for which α = σ α .In
addition to the importance-sampling method described here, a number of other MCMC
techniques tailored to the computation of Bayes factors are available, see Chen et al.
(2000).
Schwartz’s Bayesian information criterion (BIC)
Here we provide an approximation of the Bayes factor for single-response data that does
not require MCMC simulation. Let the sum of squared errors for α model be
N
S α θ α = y k α θ α − y k 2 (8.217)
k=1
where the model prediction for experiment k is y α (θ α ) and the observed responses are
k