Page 240 - A First Course In Stochastic Models
P. 240

CHAPTER 6


                     Discrete-Time Markov

                     Decision Processes








                                      6.0  INTRODUCTION

                In the previous chapters we saw that in the analysis of many operational systems
                the concepts of a state of a system and a state transition are of basic impor-
                tance. For dynamic systems with a given probabilistic law of motion, the simple
                Markov model is often appropriate. However, in many situations with uncertainty
                and dynamism, the state transitions can be controlled by taking a sequence of
                actions. The Markov decision model is a versatile and powerful tool for analysing
                probabilistic sequential decision processes with an infinite planning horizon. This
                model is an outgrowth of the Markov model and dynamic programming. The lat-
                ter concept, being developed by Bellman in the early 1950s, is a computational
                approach for analysing sequential decision processes with a finite planning horizon.
                The basic ideas of dynamic programming are states, the principle of optimality and
                functional equations.
                  In fact dynamic programming is a recursion procedure for calculating optimal
                value functions from a functional equation. This functional equation reflects the
                principle of optimality, stating that an optimal policy has the property that what-
                ever the initial state and the initial decision, the remaining decisions must constitute
                an optimal policy with regard to the state resulting from the first transition. This
                principle is always valid when the number of states and the number of actions
                are finite. At much the same time as Bellman (1957) popularized dynamic pro-
                gramming, Howard (1960) used basic principles from Markov chain theory and
                dynamic programming to develop a policy-iteration algorithm for solving proba-
                bilistic sequential decision processes with an infinite planning horizon. In the two
                decades following the pioneering work of Bellman and Howard, the theory of
                Markov decision processes has expanded at a fast rate and a powerful technology
                has developed. However, in that period relatively little effort was put into applying
                the quite useful Markov decision model to practical problems.



                A First Course in Stochastic Models H.C. Tijms
                c   2003 John Wiley & Sons, Ltd. ISBNs: 0-471-49880-7 (HB); 0-471-49881-5 (PB)
   235   236   237   238   239   240   241   242   243   244   245