Page 270 - A First Course In Stochastic Models
P. 270
VALUE-ITERATION ALGORITHM 263
Theorem 6.6.3 In the standard value-iteration algorithm the lower and upper
bounds satisfy
m k+1 ≥ m k and M k+1 ≤ M k for all k ≥ 1.
Proof By the definition of policy R(n),
V n (i) = c i (R i (n)) + p ij (R i (n))V n−1 (j), i ∈ I. (6.6.8)
j∈I
In the same way as (6.6.5) was obtained, we find for any policy R that
c i (R i ) + p ij (R i )V n−1 (j) ≥ V n (i), i ∈ I. (6.6.9)
j∈I
Taking n = k in (6.6.8) and taking n = k + 1 and R = R(k) in (6.6.9) gives
V k+1 (i) − V k (i) ≤ p ij (R i (k)){V k (j) − V k−1 (j)}, i ∈ I. (6.6.10)
j∈I
Similarly, by taking n = k + 1 in (6.6.8) and taking n = k and R = R(k + 1) in
(6.6.9), we find
V k+1 (i) − V k (i) ≥ p ij (R i (k + 1)) {V k (j) − V k−1 (j)} , i ∈ I. (6.6.11)
j∈I
Since V k (j) − V k−1 (j) ≤ M k for all j ∈ I and j∈I p ij (R i (k)) = 1, it follows
from (6.6.10) that V k+1 (i) − V k (i) ≤ M k for all i ∈ I. This gives M k+1 ≤ M k .
Similarly, we obtain from (6.6.11) that m k+1 ≥ m k .
Data transformation
The periodicity issue can be circumvented by a perturbation of the one-step transi-
tion probabilities. The perturbation technique is based on the following two obser-
vations. First, a recurrent state allowing for a direct transition to itself must be
aperiodic. Second, the relative frequencies at which the states of a Markov chain
are visited do not change when the state changes are delayed with a constant factor
and the probability of a self-transition is accordingly enlarged. In other words, if
the one-step transition probabilities p ij of a Markov chain {X n } are perturbed as
p = τp ij for j = i and p = τp ii + 1 − τ for some constant τ with 0 < τ < 1,
ij ii
the perturbed Markov chain {X n } with one-step transition probabilities p is aperi-
ij
odic and has the same equilibrium probabilities as the original Markov chain {X n }
(verify). Thus a Markov decision model involving periodicities may be perturbed
as follows. Choosing some constant τ with 0 < τ < 1, the state space, the action
sets, the one-step costs and the one-step transition probabilities of the perturbed