Page 399 - Numerical Methods for Chemical Engineering
P. 399

388     8 Bayesian statistics and parameter estimation



                   The least-squares method reconsidered

                   Our proposed prior (8.68) makes use of the assumption of prior independence,

                                              p(θ,σ) = p(θ)p(σ)                       (8.72)
                   which states that, prior to the experiment, the belief systems about θ and σ are independent.
                   The posterior density is then
                                          p(θ,σ|y) ∝ l(θ,σ|y)p(θ)p(σ)                 (8.73)

                   Assuming the Gauss–Markov conditions hold and that the errors are normally distributed,
                   the likelihood function is
                                                     N
                                                 1      −N        1

                                    l(θ,σ|y) =  √      σ   exp −   2  S(θ)            (8.74)
                                                 2π              2σ
                   and thus the posterior is
                                              N
                                          1      −N        1

                             p(θ,σ|y) ∝  √      σ   exp −   2  S(θ) p(θ)p(σ)          (8.75)
                                          2π              2σ
                   If, as we have assumed in (8.68), the prior is uniform in the region of appreciable nonzero
                   likelihood, p(θ) ∼ c, then the most probable value of θ, for any value of σ, is that which
                   minimizes S(θ), (8.58). Therefore, the least-squares method is justified statistically, as long
                   as the Gauss–Markov conditions hold and the errors are normally distributed.
                     It is shown in the supplemental material in the accompanying website that, in the fre-
                   quentist view, least squares is an unbiased estimator of the true value (i.e., if we repeat
                   the set of experiments many times, the average estimate is the true value) if merely the
                   zero-mean Gauss–Markov condition (8.50) is satisfied.


                   Numerical treatment of nonlinear least-squares problems
                                                [k]
                   For a linear model y [k]  = x [k]  · θ + ε , we obtain the least-squares estimate by solving an
                                   T
                                             T
                   algebraic system [X X]θ LS = X y. For a nonlinear model
                                                      [k]


                                             y [k]  = f x ; θ + ε [k]                 (8.76)
                   we must find the least-squares estimate through numerical optimization. For notational
                   convenience, we define the cost function as
                                                       N
                                             1       1  	    [k]     [k]     2
                                    F cost (θ) =  S(θ) =  y   − f x ; θ               (8.77)
                                             2       2
                                                      k=1
                   The gradient γ = ∆F of this cost function has components
                                              N                )        *
                                    ∂F cost  	    [k]      [k]      ∂ f
                            γ a (θ) =    =−      y  − f x ; θ                         (8.78)

                                                                      [k]
                                     ∂θ a                        ∂θ a x ;θ
                                             k=1
                   We find a local minimum by applying either the nonlinear conjugate gradient method or,
                   as below, a variation of Newton’s method. For the latter technique, we use the Hessian
   394   395   396   397   398   399   400   401   402   403   404