Page 398 - Numerical Methods for Chemical Engineering
P. 398

The Bayesian view of statistical inference                          387



                  after having conducted both sets of experiments is


                                             1
                                       p θ|y , y  2   ∝ l   2   θ|y  2   p θ|y  1    (8.67)
                  Bayes’ theorem provides a framework for learning from new experiences.

                    Finally, in some instances we may wish to use a prior based upon past experience with
                  similar systems. While such priors violate the ideal of objectivity in statistical analysis, they
                  can provide better estimates if the subjective prior is chosen well (i.e., the “expert” choosing
                  the prior knows what he is talking about). Several approaches to constructing priors to agree
                  with preexisting conditions, data, or experience are discussed in Robert, (2001).
                    Here, we consider a method for generating priors that is “objective” in the sense that
                  as long as different analysts agree to use this approach, they will independently come to
                  (nearly) the same statistical conclusions given only the predictors, response data, model, and
                  likelihood function. The basic technique, discussed below, is to identify a data-translation,
                  or symmetry, property that the likelihood function has, and choose the prior so that the
                  posterior density retains the same property. Such a prior does not give undue emphasis to
                  any particular region of parameter space and thus is said to be noninformative. As we later
                  show, for the single-response likelihood (8.64), a noninformative prior is

                                 p(θ,σ) = p(θ)p(σ)    p(θ) ∼ c   p(σ) ∝ σ  −1        (8.68)

                  As we use a prior that is uniform in θ, the Bayesian most probable estimate θ M agrees with
                  the maximum likelihood estimate θ MLE .
                    The noninformative prior (8.68) is improper as it does not satisfy the normalization
                  condition
                                            ' '
                                                 p(θ,σ)dσdθ = 1                      (8.69)
                                             σ>0
                                             P
                  and furthermore does not integrate to a finite value. For some aspects of our analysis, this
                  is not a problem as the posterior density is still proper, but when testing hypotheses, it is
                  important that the prior density also be a proper distribution. We suggest here a very simple
                  fix to this problem, by providing the a priori upper and lower bounds,

                            θ ={θ|θ j,lo ≤ θ j ≤ θ j,hi , j = 1, 2,..., N} σ lo ≤ σ ≤ σ hi  (8.70)

                  withindicatorfunctions I θ (θ)and I σ (σ)thatequal1forparametersthatsatisfythesebounds
                  and equal 0 for values outside of the bounds. We then define the proper prior density
                                                                           
                                                    '            '
                     p(θ,σ) = c 0 σ  −1 I θ (θ)I σ (σ)  c −1  =    I θ (θ)dθ     σ  −1 I σ (σ)dσ    (8.71)
                                              0
                                                     P           σ>0

                  We show below why this prior is used, but for the moment, let us accept it as valid and
                  continue our discussion.
   393   394   395   396   397   398   399   400   401   402   403