Page 272 - A First Course In Stochastic Models
P. 272
VALUE-ITERATION ALGORITHM 265
where R u and R v are the actions which are prescribed by policy R(n) in the states
u and v.
Step 4(b). For each i ∈ I, change V n (i) according to
V n (i) := V n−1 (i) + ω{V n (i) − V n−1 (i)}.
Step 4(c). n := n + 1 and go to step 1.
In the case of a tie when selecting in step 4(a) the state u for which the minimum
in m n is obtained, it is conventional to choose the minimizing state of the previous
iteration when that state is one of the candidates to choose; otherwise, choose the
first state achieving the minimum in m n . The same convention is used for the
maximizing action v in M n .
The choice of the dynamic relaxation factor ω is motivated as follows. We change
the estimate V n (i) as V n (i) = V n−1 (i) + ω{V n (i) − V n−1 (i)} for all i in order to
accomplish at the (n + 1)th iteration that
c u (R u ) + p uj (R u )V n (j) − V n (u) = c v (R v ) + p vj (R v )V n (j) − V n (v),
j∈I j∈I
in the implicit hope that the difference between the new upper and lower bounds
M n+1 and m n+1 will decrease more quickly. Using the relation m n = V n (u) −
V n−1 (u) = c u (R u )+ j p uj (R u )V n−1 (j)−V n−1 (u) and the similar relation for M n ,
it is a matter of simple algebra to verify from the above condition the expression
for ω. We omit the easy proof that ω > 0. Numerical experiments indicate that
using a dynamic relaxation factor in value iteration often greatly enhances the
speed of convergence of the algorithm. The modified value-iteration algorithm is
theoretically not guaranteed to converge, but in practice the algorithm will usually
work very well. It is important to note that the relaxation factor ω is kept outside
the recursion equation in step 1 so that the bounds m n and M n in step 2 are not
destroyed. Although the bounds apply, it is no longer true that the sequences {m n }
and {M n } are monotonic.
To conclude this section, we apply value iteration to two examples. The first
example concerns the maintenance problem from Example 6.1.1 and the second
example illustrates the usefulness of value iteration for the computation of perfor-
mance measures for a single Markov chain.
Example 6.1.1 (continued) A maintenance problem
For the maintenance problem the recursion equation (6.6.1) becomes
N
V n (1) = 0 + q 1j V n−1 (j),
j=1
N
V n (i) = min 0 + q ij V n−1 (j), C pi + V n−1 (1) , 1 < i < N,
j=i