Page 283 - A First Course In Stochastic Models
P. 283

276             DISCRETE-TIME MARKOV DECISION PROCESSES

                bounds on the minimal average cost and on the average cost of the policies gener-
                ated by the algorithm. These authors extended the original value-iteration bounds
                of MacQueen (1966) for the discounted cost case to the average cost case. The
                modified value-iteration algorithm with a dynamic relaxation factor comes from
                Popyack et al. (1979). The first proof of the geometric convergence of the undis-
                counted value-iteration algorithm was given by White (1963) under a very strong
                recurrence condition. The proof in Section 6.7 is along the same lines as the proof
                given in Van der Wal (1980). General proofs for the geometric convergence of
                value-iteration can be found in the papers of Bather (1973) and Schweitzer and
                Federgruen (1979). These papers demonstrate the deepness and the beauty of the
                mathematics underlying the average cost criterion. In general there is a rich math-
                ematical theory behind the Markov decision model. A good account of this theory
                can be found in the books of Hernandez-Lerma and Lasserre (1996), Puterman
                (1994) and Sennott (1999). A recommended reference for constrained Markov
                decision processes is the book of Altman (1999).
                  The Markov decision model finds applications in a wide variety of fields. Golabi
                et al. (1982), Kawai (1983), Stengos and Thomas (1980) and Tijms and Van der
                Duyn Schouten (1985) give applications to replacement and maintenance problems.
                Norman and Shearn (1980) and Kolderman and Volgenant (1985) discuss appli-
                cations to insurance and Su and Deininger (1972) give an application to water-
                resource control. Applications to control problems in telecommunication are men-
                tioned in the next chapter. A survey of real applications of Markov decision models
                can be found in White (1985).


                                          REFERENCES

                Altman, E. (1999) Constrained Markov Decision Processes. Chapman and Hall, London.
                Bather, J. (1973) Optimal decision procedures for finite Markov chains. Adv. Appl. Prob.,
                  5, 521–540.
                Bellman, R. (1957) Dynamic Programming. Princeton University Press, Princeton NJ.
                Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist., 33, 719–726.
                De Ghellinck, G. (1960) Les probl` emes de d´ ecisions s´ equentielles. Cahiers Centre Etudes
                  Recherche Op´er., 2, 161–179.
                Denardo, E.V. (1982) Dynamic Programming. Prentice Hall, Englewood Cliffs NJ.
                Denardo, E.V. and Fox, B.L. (1968) Multichain Markov renewal programs. SIAM J. Appl.
                  Math., 16, 468–487.
                Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.
                Federgruen, A. and Zipkin, P. (1984) An efficient algorithm for computing optimal (s, S)
                  policies. Operat. Res., 32, 1268–1285.
                Golabi, K., Kulkarni, R.B. and Way, C.B. (1982) A statewide pavement management sys-
                  tem. Interfaces, 12, no. 6, 5–21.
                Hastings, N.A.J. (1971) Bounds on the gain of a Markov decision process. Operat. Res., 19,
                  240–244.
                Hernandez-Lerma, O. and Lasserre, J.B. (1996) Discrete-Time Markov Control Processes.
                  Springer-Verlag, Berlin.
   278   279   280   281   282   283   284   285   286   287   288