Page 285 - A First Course In Stochastic Models

P. 285

CHAPTER 7

Semi-Markov Decision

Processes

7.0 INTRODUCTION

The previous chapter dealt with the discrete-time Markov decision model. In this
model, decisions can be made only at ﬁxed epochs t = 0, 1, . . . . However, in
many stochastic control problems the times between the decision epochs are not
constant but random. A possible tool for analysing such problems is the semi-
Markov decision model. In Section 7.1 we discuss the basic elements of this model.
Also, for the optimality criterion of the long-run average cost per time unit, we
give a data-transformation method by which the semi-Markov decision model can
be converted into an equivalent discrete-time Markov decision model. The data-
transformation method enables us to apply the recursive method of value-iteration
to the semi-Markov decision model. Section 7.2 summarizes various algorithms for
the computation of an average cost optimal policy.
In Section 7.3 we discuss the value-iteration algorithm for a semi-Markov deci-
sion model in which the times between the decision epochs are exponentially
distributed. For this particular case the computational effort of the value-iteration
algorithm can considerably be reduced by introducing ﬁctitious decision epochs.
This simple trick creates sparse transition matrices leading to a much more effec-
tive value-iteration algorithm. Section 7.4 illustrates how value iteration in com-
bination with an embedding idea can be used in the optimization of queues. The
semi-Markov decision model is a very useful tool for optimal control in queueing
systems. In Section 7.5 we will exploit a remarkable feature of the policy-iteration
algorithm, namely that the algorithm typically achieves its largest improvements in
costs in the ﬁrst few iterations. This ﬁnding is sometimes useful to attack the curse
of dimensionality in applications with a multidimensional state space. The idea is
to determine ﬁrst the relative values for a reasonable starting policy and to apply
next a single policy-improvement step. This heuristic approach will be illustrated
to a dynamic routing problem.

A First Course in Stochastic Models H.C. Tijms
c 2003 John Wiley & Sons, Ltd. ISBNs: 0-471-49880-7 (HB); 0-471-49881-5 (PB)

280 281 282 283 284 285 286 287 288 289 290