Page 267 - A First Course In Stochastic Models

P. 267

260 DISCRETE-TIME MARKOV DECISION PROCESSES

Intuitively, one might expect that the one-step difference V n (i)−V n−1 (i) will come
very close to the minimal average cost per time unit and that the stationary policy
whose actions minimize the right-hand side of (6.6.1) for all i will be very close
in cost to the minimal average cost. However, these matters appear to be rather
subtle for the average cost criterion due to the effect of possible periodicities in the
underlying decision processes. Before explaining this in more detail, we investigate
an operator which is induced by the recursion equation (6.6.1). The operator T
adds to each function v = (v i , i ∈ I) a function T v whose ith component (T v) i
is deﬁned by
 
 

(T v) i = min c i (a) + p ij (a)v j , i ∈ I. (6.6.2)
a∈A(i)  
j∈I
Note that (T v) i = V n (i) if v i = V n−1 (i), i ∈ I. The following theorem plays a
key role in the value-iteration algorithm.

Theorem 6.6.1 Suppose that the weak unichain assumption is satisﬁed. Let v =
(v i ) be given. Deﬁne the stationary policy R(v) as a policy which adds to each state
i ∈ I an action a = R i (v) that minimizes the right-hand side of (6.6.2). Then

∗
min {(T v) i − v i } ≤ g ≤ g s (R(v)) ≤ max {(T v) i − v i } (6.6.3)
i∈I i∈I
∗
for any s ∈ I, where g is the minimal long-run average cost per time unit and
g s (R(v)) denotes the long-run average cost per time unit under policy R(v) when
the initial state is s.
Proof To prove the ﬁrst inequality, choose any stationary policy R. By the deﬁ-
nition of (T v) i , we have for any state i ∈ I that

(T v) i ≤ c i (a) + p ij (a)v j , a ∈ A(i), (6.6.4)
j∈I
where the equality sign holds for a = R i (v). Choosing a = R i in (6.6.4) gives

(T v) i ≤ c i (R i ) + p ij (R i )v j , i ∈ I. (6.6.5)
i∈I
Deﬁne the lower bound

m = min{(T v) i − v i }.
i∈I
Since m ≤ (T v) i − v i for all i, it follows from (6.6.5) that m + v i ≤ c i (R i ) +

p ij (R i )v j for all i ∈ I, and so
j∈I

c i (R i ) − m + p ij (R i )v j ≥ v i , i ∈ I.
j∈I

262 263 264 265 266 267 268 269 270 271 272