Page 394 - Numerical Methods for Chemical Engineering
P. 394

The Bayesian view of statistical inference                          383



                  that E 2 occurs be P(E 2 ). The joint probability that both occur, P(E 1 ∩ E 2 ), is related to the
                  conditional probability that E 1 occurs if E 2 occurs, P(E 1 |E 2 ), by
                                         P(E 1 ∩ E 2 ) = P(E 1 |E 2 )P(E 2 )         (8.43)
                  Similarly, if P(E 2 |E 1 ) is the conditional probability that E 2 occurs if E 1 occurs, we may
                  write the joint probability as
                                         P(E 1 ∩ E 2 ) = P(E 2 |E 1 )P(E 1 )         (8.44)

                  Bayes’ theorem simply states that these two expressions for the joint probability must be
                  equal,
                                  P(E 1 ∩ E 2 ) = P(E 1 |E 2 )P(E 2 ) = P(E 2 |E 1 )P(E 1 )  (8.45)

                  and thus
                                                     P(E 1 |E 2 )P(E 2 )
                                          P(E 2 |E 1 ) =                             (8.46)
                                                         P(E 1 )
                  So how do we get from this axiom of probability theory to a framework for making inferences
                  from data?


                  Bayesian view of single-response regression
                  Let us consider the single-response regression problem, where the response value in exper-
                  iment k of a set of N experiments equals the value determined by the “true” model plus
                  some random error,

                                                    [k]

                                           y [k]  = f x ; θ (true)    + ε [k]        (8.47)
                  ε [k]  is a random error whose stochastic properties are unknown. Here, we have used θ (true)
                  to denote that the random error is added to the model predictions with the “true” values of
                  the parameters (which are, of course, unknown to us). The predicted response value in each
                  experiment, as a function of θ,is
                                                         [k]
                                               [k]

                                              ˆ y (θ) = f x ; θ                      (8.48)
                  If we were to do the same set of experiments again, we would get each time a new vector
                  of measurement errors

                                            ε = [ε 1  ε 2  ...  ε N ] T              (8.49)
                  The general difficulty that we face is we do not really know much about the properties of
                  the random error, but the accuracy of the model parameter estimates is largely determined
                  by these errors. Therefore, to obtain a tractable approach to analysis, we make a number of
                  assumptions about the nature of the error. We will need to check these assumptions later
                  for consistency with the data, especially if we use our analysis to test hypotheses.
                    If ε is truly a measurement error and not a deficiency of the model, we expect that if we
                  do the set of measurements over and over again many times, the expectations (averages) of
                  each ε [k]  will be zero:
                                                      [k]
                                                 E ε   = 0                           (8.50)
   389   390   391   392   393   394   395   396   397   398   399