Page 425 - Numerical Methods for Chemical Engineering
P. 425
414 8 Bayesian statistics and parameter estimation
linearized design matrix. In cases where it may be very costly to come back later to perform
additional experiments, we may wish to try multiple estimates of θ, repeat the eigenvalue
analysis for each linearized design matrix, and accept only a design that appears to provide
sufficient accuracy for all plausible values of θ.
Example. Determining the number of additional experiments
necessary for the protein expression data
We consider once again the data for the protein expression levels of wild-type and mutant
T
bacterial strains (8.35) with X X and its inverse again given by (8.40). For a specified σ,
the standard deviation of θ 2 is
√
,
T
std(θ 2 ) = σ (X X) −1 = σ 2n −1 (8.176)
22
The expected width of the confidence interval in this parameter is then
√
−1
θ 2 − θ M,2 ≈ Z α/2 σ 2n (8.177)
Or, to account roughly for the extra uncertainty in σ, we could use
√
−1
θ 2 − θ M,2 ≈ T n−2,α/2 s 2n (8.178)
We can use (8.178) with n = 4 + m and the s-value from the existing data to estimate the
number m of additional experiments necessary to reduce the uncertainty in θ 2 to a desired
level.
Here, our emphasis has been upon experimental design; however, eigenvalue analysis
T
and SVD of the design matrix can also be used to extract at least partial results when X X
is singular. This subject is discussed in further detail in the supplemental material in the
accompanying website.
Bayesian multiresponse regression
Previously, we have considered only the analysis of single-response data. Here, we discuss
multiresponse regression, focusing primarily upon the extension of the least-squares method
to the case of multiple, perhaps correlated, responses in each experiment.
Again, we perform a number N of experiments, where in the kth experiment, we
M
have a known set of M predictor variables, x [k] ∈ , and we observe the L responses
L
P
y [k] ∈ . We wish to estimate the values of P unknown parameters θ ∈ , in a model
L
[k]
whose predicted responses for each experiment form a vector f (x ; θ) ∈ . We assume
that the measured responses are equal to the model predictions plus a random error
vector,
[k]
y [k] = f x ; θ + ε [k] (8.179)
L
ε [k] ∈ is assumed to be independent of the other ε [l =k] , but we allow that the components
of ε [k] may be correlated. The L × L covariance matrix (unknown) of each error vector is