Page 260 - Numerical Methods for Chemical Engineering
P. 260

Optimal control                                                     249



                  the additional cost accrued after time t if we were to implement the optimal inputs at all
                  subsequent times s ∈ [t, t H ],
                                                                      
                                             t H '                    
                                               σ(s, x(s), u(s))ds + π(x(t H ))      (5.156)
                           V (t, x) = min u(s>t)
                                                                      
                                             t
                        [0]
                                                     [0]
                  V (t 0 , x ) is the optimal value of F[u(t); x ]. At the horizon time V (t H , x) = π(x), but
                  how do we work backwards in time from this known result, and how from such a calculation,
                  do we determine the optimal u(t)?
                    We obtain a differential equation for V (t, x) by taking the derivative of (5.156), evaluated
                  at the optimal control input u(t),
                                          d
                                            V (t, x) =−σ(t, x(t), u(t))             (5.157)
                                          dt
                  We relate this total time derivative to partial derivatives of V (t, x),
                           d           ∂V         dx   ∂V
                             V (t, x(t)) =  + ∇V ·   =    + ∇V · f (t, x, u)        (5.158)
                          dt            ∂t        dt    ∂t
                  Thus, for the optimal control trajectory, (5.157) and (5.158) yield the partial differential
                  equation

                                     ∂V
                                           =−σ(t, x, u) − ∇V · f (t, x, u)          (5.159)
                                      ∂t
                                         opt
                  We next define the “backward” time τ = t H − t, and rewrite the Bellman function as
                  ϕ(τ, x) = V (t, x). We have for this equation the “initial” condition ϕ(0, x) = π(x). We
                  then rewrite (5.159) as

                                  ∂ϕ
                                        = σ(t H − τ, x, u) + ∇ϕ · f (t H − τ, x, u)  (5.160)
                                  ∂τ
                                     opt
                  For nonoptimal trajectories, ϕ(τ, x) increases faster with increasing τ than for the optimal
                  trajectory, so that, in general,
                                   ∂ϕ
                                       ≥ σ(t H − τ, x, u) + ∇ϕ · f (t H − τ, x, u)  (5.161)
                                   ∂τ
                  Thus, for the optimal trajectory, we have the following PDE for ϕ(τ, x), known as the
                  Hamilton–Jacobi–Bellman (HJB) equation:
                           ∂ϕ
                              = min u(τ, x) [σ(t H − τ, x, u) + ∇ϕ · f (t H − τ, x, u)]  (5.162)
                           ∂τ
                  We work backwards in time, and at each (τ, x), use the input u(τ, x) that minimizes the term
                  in the square brackets. Note that it is possible for ∂ϕ/∂τ < 0evenif σ(t H − τ, x, u) > 0, as
                  the requirement for monotonic decrease in V (t, x) with increasing t is dϕ/dτ = ∂φ/∂τ −
                  ∇ϕ · f > 0.
                    In an open-loop control problem, we compute the optimal control input trajectory and
                  then fully implement it over the entire period t 0 ≤ t ≤ t H . For this, the direct approach
   255   256   257   258   259   260   261   262   263   264   265