Page 285 - A First Course In Stochastic Models
P. 285

CHAPTER 7


                     Semi-Markov Decision

                     Processes








                                      7.0  INTRODUCTION

                The previous chapter dealt with the discrete-time Markov decision model. In this
                model, decisions can be made only at fixed epochs t = 0, 1, . . . . However, in
                many stochastic control problems the times between the decision epochs are not
                constant but random. A possible tool for analysing such problems is the semi-
                Markov decision model. In Section 7.1 we discuss the basic elements of this model.
                Also, for the optimality criterion of the long-run average cost per time unit, we
                give a data-transformation method by which the semi-Markov decision model can
                be converted into an equivalent discrete-time Markov decision model. The data-
                transformation method enables us to apply the recursive method of value-iteration
                to the semi-Markov decision model. Section 7.2 summarizes various algorithms for
                the computation of an average cost optimal policy.
                  In Section 7.3 we discuss the value-iteration algorithm for a semi-Markov deci-
                sion model in which the times between the decision epochs are exponentially
                distributed. For this particular case the computational effort of the value-iteration
                algorithm can considerably be reduced by introducing fictitious decision epochs.
                This simple trick creates sparse transition matrices leading to a much more effec-
                tive value-iteration algorithm. Section 7.4 illustrates how value iteration in com-
                bination with an embedding idea can be used in the optimization of queues. The
                semi-Markov decision model is a very useful tool for optimal control in queueing
                systems. In Section 7.5 we will exploit a remarkable feature of the policy-iteration
                algorithm, namely that the algorithm typically achieves its largest improvements in
                costs in the first few iterations. This finding is sometimes useful to attack the curse
                of dimensionality in applications with a multidimensional state space. The idea is
                to determine first the relative values for a reasonable starting policy and to apply
                next a single policy-improvement step. This heuristic approach will be illustrated
                to a dynamic routing problem.



                A First Course in Stochastic Models H.C. Tijms
                c   2003 John Wiley & Sons, Ltd. ISBNs: 0-471-49880-7 (HB); 0-471-49881-5 (PB)
   280   281   282   283   284   285   286   287   288   289   290