Page 389 - Numerical Methods for Chemical Engineering
P. 389

378     8 Bayesian statistics and parameter estimation



                   and is known prior to collecting the response data.
                                [1]  [1]       [1]        [1]   [1]      [1]  
                            1   x    x    ...  x            x    x    ...  x
                                1     2         M            1    2         M
                                 [2]  [2]                   [2]   [2]
                          
                           1   x    x    ...  x  [2]     x    x    ...  x  [2] 
                                1     2         M           1    2         M  
                           .    .    .                   .      .          
                                                          
                          
                      X =   .    .    .         . .   X =   .     .        . .      (8.17)
                           .    .    .         .         .      .        . 
                                [N]   [N]      [N]           [N]  [N]       [N]
                            1  x     x    ...  x            x    x    ...  x
                                1     2        M             1    2         M
                                 with y-intercept            without y-intercept
                     The vector of predicted responses in each experiment is then
                                                    [1]  
                                                    ˆ y (θ)
                                                       .
                                                       .   = Xθ                      (8.18)
                                                         
                                                       .
                                            ˆ y(θ) = 
                                                    ˆ y [N] (θ)
                   Linear least-squares regression
                                                                                     [k]
                                                  [k]
                   We vary θ until the model predictions ˆ y (θ) agree most closely with the observed y .We
                   must define what we mean by “close agreement,” but a readily apparent choice of metric is
                   that we select the value θ LS that minimizes the sum of squared errors
                                             N
                                                 [k]  [k]    2         2

                                     S(θ) ≡     y  − ˆ y (θ)  =|y − ˆy(θ)|            (8.19)
                                            k=1
                   That is,

                                           ∂S           2
                                                 = 0   ∇ S(θ)| θ LS  > 0              (8.20)
                                          ∂θ  T   θ LS
                   Substituting ˆy(θ) = Xθ yields
                                                               T
                                                         T
                                                 T
                                                                           T
                                    T
                       S(θ) = [y − Xθ] [y − Xθ] = y y − (Xθ) y − y (Xθ) + (Xθ) (Xθ)   (8.21)
                   Taking the derivative with respect to θ,
                       ∂S         T     T       T      T           T       T
                                                          T
                           = 0 − X y − X y + X X + X X     θ =−2X y + 2(X X)θ         (8.22)
                       ∂θ T
                   and setting it equal to zero yields a linear system for θ LS ,
                                     T         T                 T  −1  T
                                  (X X)θ LS = X y   ⇒   θ LS = (X X)  X   y           (8.23)
                    T
                   X X is a P × P matrix; its size is governed by the number of fitted parameters. The (i, j)
                             T
                   element of X X is
                                                        N

                                                T
                                             (X X) ij =   X ki X kj                   (8.24)
                                                       k=1
                                                                                        T
                   As we increase the number of experiments N, the magnitudes of the elements of X X
                            T
                   increase. X X contains information about the ability of the experimental design to probe the
   384   385   386   387   388   389   390   391   392   393   394