Page 288 - A First Course In Stochastic Models
P. 288
282 SEMI-MARKOV DECISION PROCESSES
and
1
lim E(time over the first m decision epochs)
m→∞ m
E(length of one cycle)
= .
E(number of transitions in one cycle)
Together the above three relations yield (7.1.2). The remainder of the proof is
simple. Obviously, we have
m−1
(t)
E(costs over the first m decision epochs) = c j (R j )p (R)
rj
t=0 j∈I
and
m−1
(t)
E(time over the first m decision epochs) = τ j (R j )p (R).
rj
t=0 j∈I
Dividing the numerator and the denominator of the right-hand side of (7.1.2) by
m−1 (t)
m, letting m → ∞ and using lim m→∞ (1/m) t=0 p (R) = π j (R), the result
rj
(7.1.1) follows when the initial state X 0 = r. For initial state X 0 = i the result next
follows by mimicking the proof of Theorem 3.5.11 and noting that state r will be
reached from state i with probability 1 after finitely many transitions.
A stationary policy R is said to be average cost optimal if g i (R ) ≤ g i (R) for
∗
∗
all i ∈ I and all stationary policies R. The algorithms for computing an average
cost optimal policy in the discrete-time Markov decision model can be extended to
the semi-Markov decision model. This will be done in the next section. However,
before doing this, we discuss a data-transformation method that converts the semi-
Markov decision model into a discrete-time Markov decision model such that for
each stationary policy the average cost per time unit in the discrete-time Markov
model is the same as in the semi-Markov model. This is a very useful result. The
data-transformation method is an extension of the uniformization technique for
continuous-time Markov chains discussed in Section 4.5.
The data-transformation method
First choose a number τ with
0 < τ ≤ min τ i (a).
i,a
Consider now the discrete-time Markov decision model whose basic elements are
given by
I = I, A(i) = A(i), i ∈ I,
c i (a) = c i (a)/τ i (a), a ∈ A(i) and i ∈ I,