Page 401 - Numerical Methods for Chemical Engineering
P. 401

390     8 Bayesian statistics and parameter estimation



                   Using the likelihood function that follows from the Gauss–Markov conditions and the
                   assumption of normally-distributed errors, we have
                                                          1

                                                −N
                                    p(θ,σ|y) ∝ σ   exp −    S(θ) p(θ)p(σ)             (8.86)
                                                         2σ  2
                   How do we propose priors p(θ) and p(σ)?
                     Let θ MLE = θ LS be the maximum likelihood estimate of θ, i.e., that maximizing the
                   likelihood by minimizing S(θ). Let us now write S(θ)as

                                         S(θ) = [S(θ) − S(θ LS )] + S(θ LS )          (8.87)
                   so that the posterior becomes
                                                         1            1
                                           [S(θ) − S(θ LS )]    S(θ LS )
                                   −N
                       p(θ,σ|y) ∝ σ  exp −                exp −         p(θ)p(σ)      (8.88)
                                                2σ 2              2σ 2
                                                2
                   We now define the sample variance s :
                                              1
                                          2
                                         s =   S(θ LS )  ν = N − dim(θ)               (8.89)
                                              ν
                   In the supplemental material in the accompanying website, we show that if the Gauss–
                                             2
                                                                          2
                   Markov conditions (8.54) hold, s provides an unbiased estimate for σ . That is, if we redo
                                                                                     2
                                                                                2
                                                                    2
                   the set of experiments many times and average the computed s in each, E[s ] = σ .
                     Using the definition of the sample variance, the posterior becomes
                                            1                      νs
                                                           1         2  1
                                  −N
                       p(θ,σ|y) ∝ σ  exp −    [S(θ) − S(θ LS )] exp −   p(θ)p(σ)      (8.90)
                                           2σ  2                   2σ  2
                   We now define a likelihood function for σ given s,
                                                               2  1
                                                             νs
                                                    −N
                                           l(σ|s) ∝ σ  exp −                          (8.91)
                                                             2σ  2
                   and a “conditional likelihood” function for θ given y and σ,
                                                       1
                                                                      1
                                      l(θ|y,σ) ∝ exp −   [S(θ) − S(θ LS )]            (8.92)
                                                      2σ 2
                   The posterior density then partitions into two contributions,
                                     p(θ,σ|y) ∝ [l(θ|y,σ)p(θ)] × [l(σ|s)p(σ)]         (8.93)
                   We now consider each contribution independently to search for priors that seem most
                   satisfying, in terms of being reproducible by different analysts.


                   Noninformative prior for θ
                   We begin first with the “conditional posterior” for θ, treating σ as known,
                                                        1
                                                                       1
                         p(θ|y,σ) ∝ l(θ|y,σ)p(θ) = exp −  [S(θ) − S(θ LS )] p(θ)      (8.94)
                                                       2σ 2
   396   397   398   399   400   401   402   403   404   405   406