Page 287 - A First Course In Stochastic Models

P. 287

THE SEMI-MARKOV DECISION MODEL 281

Deﬁne the random variable Z (t) by
Z(t) = the total costs incurred up to time t, t ≥ 0.

Fix now a stationary policy R. Denote by E i,R the expectation operator when the
initial state X 0 = i and policy R is used. Then the limit
1
g i (R) = lim E i,R [Z(t)]
t→∞ t
exists for all i ∈ I. This result can be proved by using the renewal-reward theorem
in Section 2.2. The details are omitted. Just as in the discrete-time model, we
can give a stronger interpretation to the average cost g i (R). If the initial state i
is recurrent under policy R, then the long-run actual average cost per time unit
equals g i (R) with probability 1. If the Markov chain {X n } associated with policy
R has no two disjoint closed sets, the average cost g i (R) does not depend on the
initial state X 0 = i.

Theorem 7.1.1 Suppose that the embedded Markov chain {X n } associated with
policy R has no two disjoint closed sets. Then
Z(t)
lim = g(R) with probability 1 (7.1.1)
t→∞ t
for each initial state X 0 = i, where the constant g(R) is given by

g(R) = c j (R j )π j (R)/ τ j (R j )π j (R)
j∈I j∈I
with {π j (R)} denoting the equilibrium distribution of the Markov chain {X n }.
Proof We give only a sketch of the proof of (7.1.1). The key to the proof of
(7.1.1) is that
Z(t) E(costs over the ﬁrst m decision epochs)
lim = lim (7.1.2)
t→∞ t m→∞ E(time over the ﬁrst m decision epochs)
with probability 1. To verify this relation, ﬁx a recurrent state r and suppose that
X 0 = r. Let a cycle be deﬁned as the time elapsed between two consecutive
transitions into state r. By the renewal-reward theorem in Section 2.2,

Z(t) E(costs induring one cycle)
lim =
t→∞ t E(length of one cycle)
with probability 1. By the expected-value version of the renewal-reward theorem,
1
lim E(costs over the ﬁrst m decision epochs)
m→∞ m
E(costs incurred during one cycle)
=
E(number of transitions in one cycle)

282 283 284 285 286 287 288 289 290 291 292