Page 285 - A First Course In Stochastic Models
P. 285
CHAPTER 7
Semi-Markov Decision
Processes
7.0 INTRODUCTION
The previous chapter dealt with the discrete-time Markov decision model. In this
model, decisions can be made only at fixed epochs t = 0, 1, . . . . However, in
many stochastic control problems the times between the decision epochs are not
constant but random. A possible tool for analysing such problems is the semi-
Markov decision model. In Section 7.1 we discuss the basic elements of this model.
Also, for the optimality criterion of the long-run average cost per time unit, we
give a data-transformation method by which the semi-Markov decision model can
be converted into an equivalent discrete-time Markov decision model. The data-
transformation method enables us to apply the recursive method of value-iteration
to the semi-Markov decision model. Section 7.2 summarizes various algorithms for
the computation of an average cost optimal policy.
In Section 7.3 we discuss the value-iteration algorithm for a semi-Markov deci-
sion model in which the times between the decision epochs are exponentially
distributed. For this particular case the computational effort of the value-iteration
algorithm can considerably be reduced by introducing fictitious decision epochs.
This simple trick creates sparse transition matrices leading to a much more effec-
tive value-iteration algorithm. Section 7.4 illustrates how value iteration in com-
bination with an embedding idea can be used in the optimization of queues. The
semi-Markov decision model is a very useful tool for optimal control in queueing
systems. In Section 7.5 we will exploit a remarkable feature of the policy-iteration
algorithm, namely that the algorithm typically achieves its largest improvements in
costs in the first few iterations. This finding is sometimes useful to attack the curse
of dimensionality in applications with a multidimensional state space. The idea is
to determine first the relative values for a reasonable starting policy and to apply
next a single policy-improvement step. This heuristic approach will be illustrated
to a dynamic routing problem.
A First Course in Stochastic Models H.C. Tijms
c 2003 John Wiley & Sons, Ltd. ISBNs: 0-471-49880-7 (HB); 0-471-49881-5 (PB)