Page 283 - A First Course In Stochastic Models
P. 283
276 DISCRETE-TIME MARKOV DECISION PROCESSES
bounds on the minimal average cost and on the average cost of the policies gener-
ated by the algorithm. These authors extended the original value-iteration bounds
of MacQueen (1966) for the discounted cost case to the average cost case. The
modified value-iteration algorithm with a dynamic relaxation factor comes from
Popyack et al. (1979). The first proof of the geometric convergence of the undis-
counted value-iteration algorithm was given by White (1963) under a very strong
recurrence condition. The proof in Section 6.7 is along the same lines as the proof
given in Van der Wal (1980). General proofs for the geometric convergence of
value-iteration can be found in the papers of Bather (1973) and Schweitzer and
Federgruen (1979). These papers demonstrate the deepness and the beauty of the
mathematics underlying the average cost criterion. In general there is a rich math-
ematical theory behind the Markov decision model. A good account of this theory
can be found in the books of Hernandez-Lerma and Lasserre (1996), Puterman
(1994) and Sennott (1999). A recommended reference for constrained Markov
decision processes is the book of Altman (1999).
The Markov decision model finds applications in a wide variety of fields. Golabi
et al. (1982), Kawai (1983), Stengos and Thomas (1980) and Tijms and Van der
Duyn Schouten (1985) give applications to replacement and maintenance problems.
Norman and Shearn (1980) and Kolderman and Volgenant (1985) discuss appli-
cations to insurance and Su and Deininger (1972) give an application to water-
resource control. Applications to control problems in telecommunication are men-
tioned in the next chapter. A survey of real applications of Markov decision models
can be found in White (1985).
REFERENCES
Altman, E. (1999) Constrained Markov Decision Processes. Chapman and Hall, London.
Bather, J. (1973) Optimal decision procedures for finite Markov chains. Adv. Appl. Prob.,
5, 521–540.
Bellman, R. (1957) Dynamic Programming. Princeton University Press, Princeton NJ.
Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist., 33, 719–726.
De Ghellinck, G. (1960) Les probl` emes de d´ ecisions s´ equentielles. Cahiers Centre Etudes
Recherche Op´er., 2, 161–179.
Denardo, E.V. (1982) Dynamic Programming. Prentice Hall, Englewood Cliffs NJ.
Denardo, E.V. and Fox, B.L. (1968) Multichain Markov renewal programs. SIAM J. Appl.
Math., 16, 468–487.
Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.
Federgruen, A. and Zipkin, P. (1984) An efficient algorithm for computing optimal (s, S)
policies. Operat. Res., 32, 1268–1285.
Golabi, K., Kulkarni, R.B. and Way, C.B. (1982) A statewide pavement management sys-
tem. Interfaces, 12, no. 6, 5–21.
Hastings, N.A.J. (1971) Bounds on the gain of a Markov decision process. Operat. Res., 19,
240–244.
Hernandez-Lerma, O. and Lasserre, J.B. (1996) Discrete-Time Markov Control Processes.
Springer-Verlag, Berlin.