Page 260 - A First Course In Stochastic Models
P. 260

LINEAR PROGRAMMING APPROACH                   253


                Step 2. Start with the non-empty set I 0 := {i |  x  ∗  > 0}. For any state
                                                          a∈A(i) ia
                i ∈ I 0 , set the decision
                                 R := a  for some a such that x > 0.
                                  ∗
                                                            ∗
                                  i                         ia
                                                                   ∗
                Step 3. If I 0 = I, then the algorithm is stopped with policy R . Otherwise, deter-
                mine some state i /∈ I 0 and action a ∈ A(i) such that p ij (a) > 0 for some j ∈ I 0 .
                Next set R := a and I 0 := I 0 ∪ {i} and repeat step 3.
                         ∗
                         i
                  The linear program (6.5.1) can heuristically be explained by interpreting the
                variables x ia as
                          x ia = the long-run fraction of decision epochs at which
                               the system is in state i and action a is made.

                The objective of the linear program is the minimization of the long-run average
                cost per time unit, while the first set of constraints represent the balance equations
                requiring that for any state j ∈ I the long-run average number of transitions from
                state j per time unit must be equal to the long-run average number of transitions
                into state j per time unit. The last constraint obviously requires that the sum of
                the fractions x ia must be equal to 1.
                  Next we sketch a proof that the linear programming algorithm leads to an average
                                 ∗
                cost optimal policy R when the weak unichain assumption is satisfied. Our starting
                point is the average cost optimality equation (6.4.3). Since this equation is solvable,
                the linear inequalities


                            g + v i −  p ij (a)v j ≤ c i (a),  a ∈ A(i) and i ∈ I  (6.5.2)
                                    j∈I
                must have a solution. It follows from Theorem 6.2.1 that any solution {g, v i } to
                these inequalities satisfies g ≤ g i (R) for any i ∈ I and any policy R. Hence we
                can conclude that for any solution {g, v i } to the linear inequalities (6.5.2) holds
                         ∗
                that g ≤ g with g being the minimal average cost per time unit. Hence, using
                                ∗
                the fact that relative values v , i ∈ I, exist such that {g , v } constitutes a solution
                                       ∗
                                                               ∗
                                                             ∗
                                       i                       i
                to (6.5.2), the linear program
                                            Maximize g                       (6.5.3)
                subject to

                           g + v i −  p ij (a)v j ≤ c i (a),  a ∈ A(i) and i ∈ I,
                                   j∈I
                                        g, v i unrestricted,

                                           ∗
                has the minimal average cost g as the optimal objective-function value. Next
                observe that the linear program (6.5.1) is the dual of the primal linear program
   255   256   257   258   259   260   261   262   263   264   265