Page 260 - A First Course In Stochastic Models
P. 260
LINEAR PROGRAMMING APPROACH 253
Step 2. Start with the non-empty set I 0 := {i | x ∗ > 0}. For any state
a∈A(i) ia
i ∈ I 0 , set the decision
R := a for some a such that x > 0.
∗
∗
i ia
∗
Step 3. If I 0 = I, then the algorithm is stopped with policy R . Otherwise, deter-
mine some state i /∈ I 0 and action a ∈ A(i) such that p ij (a) > 0 for some j ∈ I 0 .
Next set R := a and I 0 := I 0 ∪ {i} and repeat step 3.
∗
i
The linear program (6.5.1) can heuristically be explained by interpreting the
variables x ia as
x ia = the long-run fraction of decision epochs at which
the system is in state i and action a is made.
The objective of the linear program is the minimization of the long-run average
cost per time unit, while the first set of constraints represent the balance equations
requiring that for any state j ∈ I the long-run average number of transitions from
state j per time unit must be equal to the long-run average number of transitions
into state j per time unit. The last constraint obviously requires that the sum of
the fractions x ia must be equal to 1.
Next we sketch a proof that the linear programming algorithm leads to an average
∗
cost optimal policy R when the weak unichain assumption is satisfied. Our starting
point is the average cost optimality equation (6.4.3). Since this equation is solvable,
the linear inequalities
g + v i − p ij (a)v j ≤ c i (a), a ∈ A(i) and i ∈ I (6.5.2)
j∈I
must have a solution. It follows from Theorem 6.2.1 that any solution {g, v i } to
these inequalities satisfies g ≤ g i (R) for any i ∈ I and any policy R. Hence we
can conclude that for any solution {g, v i } to the linear inequalities (6.5.2) holds
∗
that g ≤ g with g being the minimal average cost per time unit. Hence, using
∗
the fact that relative values v , i ∈ I, exist such that {g , v } constitutes a solution
∗
∗
∗
i i
to (6.5.2), the linear program
Maximize g (6.5.3)
subject to
g + v i − p ij (a)v j ≤ c i (a), a ∈ A(i) and i ∈ I,
j∈I
g, v i unrestricted,
∗
has the minimal average cost g as the optimal objective-function value. Next
observe that the linear program (6.5.1) is the dual of the primal linear program