Page 289 - A First Course In Stochastic Models

P. 289

THE SEMI-MARKOV DECISION MODEL 283

(τ/τ i (a))p ij (a), j = i, a ∈ A(i) and i ∈ I,
p (a) =
ij
(τ/τ i (a))p ij (a) + [1 − (τ/τ i (a))], j = i, a ∈ A(i) and i ∈ I.
This discrete-time Markov decision model has the same class of stationary policies
as the original semi-Markov decision model. For each stationary policy R, let
g (R) denote the long-run average cost per time unit in the discrete-time model
i
when policy R is used and the initial state is i. Then it holds for each stationary
policy R that
g i (R) = g (R), i ∈ I. (7.1.3)
i
This result does not require any assumption about the chain structure of the Markov
chains associated with the stationary policies. However, we prove the result (7.1.3)
only for the unichain case. Fix a stationary policy R and assume that the embedded
Markov chain {X n } in the semi-Markov model has no two disjoint closed sets.
Denote by X n the state at the nth decision epoch in the transformed discrete-
time model. It is directly seen that the Markov chain {X n } is also unichain under
policy R. The equilibrium probabilities π j (R) of the Markov chain {X n } satisfy
the equilibrium equations

π j (R) = π i (R)p (R i )
ij
i∈I
τ τ

= π i (R) p ij (R i ) + 1 − π j (R), j ∈ I.
τ i (R i ) τ j (R j )
i∈I
Hence, letting u j = π j (R)/τ j (R j ) and dividing by τ, we ﬁnd that

u j = u i p ij (R i ), j ∈ I.
i∈I
These equations are precisely the equilibrium equations for the equilibrium prob-
abilities π j (R) of the embedded Markov chain {X n } in the semi-Markov model.
The equations determine the π j (R) uniquely up to a multiplicative constant. Thus,
for some constant γ > 0,
π j (R)
π j (R) = γ , j ∈ I.
τ j (R j )

Since j∈I π j (R) = 1, it follows that γ = j∈I j (R j )π j (R). The desired result
τ
(7.1.3) now follows easily. We have
1 c j (R j )
g(R) = c j (R j )π j (R) = π j (R)τ j (R j )
γ τ j (R j )
j∈I j∈I

= c j (R j )π j (R)/ τ j (R j )π j (R)
j∈I j∈I

284 285 286 287 288 289 290 291 292 293 294