Page 310 - A First Course In Stochastic Models
P. 310
304 SEMI-MARKOV DECISION PROCESSES
7.14 Consider Exercise 7.13 again, but assume now that the residency times have a Coxian-2
distribution. Develop a value-iteration algorithm to compute the total average rejection rate
under a fixed reservation policy. A reservation policy is characterized by two integers r 1 and
r 2 with r 1 ≥ b 1 and r 2 ≥ b 2 . Under the reservation policy an arriving customer of type i is
accepted only if r i or more service units are available. Verify experimentally that the total
average rejection rate for a fixed reservation policy is nearly insensitive to the second and
higher moments of the residency times. For the case of exponentially distributed residency
times, take the average rejection rate of the best reservation policy and verify how close it
is to the theoretically minimal average rejection rate.
7.15 Consider a production/inventory system with N inventory points that share a common
production unit. At the beginning of each period, the production unit can produce for any
number of inventory points, with the stipulation that the total production size is restricted by
the capacity C of the production unit. The production time is negligible for any production
scheme. The demands at the various inventory points are independent of each other. In
each period the demand at inventory point j is Poisson distributed with mean µ j for j =
1, . . . , N. Excess demand is lost at any inventory point. The following costs are involved.
The cost of producing z j units for inventory point j equals K j + c j z j for z j > 0 regardless
of how much is produced for each of the other inventory points. In any period there is a
holding cost of h j > 0 for each unit in stock at inventory point j at the end of the period.
A stockout cost of p j is incurred for each unit of lost demand at inventory point j. Can you
think of a heuristic approach based on solving N one-dimensional problems and performing
a single policy-improvement step? This problem is based on Wijngaard (1979).
BIBLIOGRAPHIC NOTES
Semi-Markov decision processes were introduced in De Cani (1964), Howard
(1964), Jewell (1963) and Schweitzer (1965). The semi-Markov decision model
has many applications, especially in queueing control. The data-transformation
method converting a semi-Markov decision model into an equivalent discrete-time
Markov decision model was introduced in Schweitzer (1971). This uniformization
method was used in the paper of Lippman (1975) to establish the structure of
optimal control rules in queueing applications of continuous-time Markov decision
processes with exponentially distributed times between the decision epochs. The
idea of using fictitious decision epochs is also contained in this paper. The embed-
ding idea used in Section 7.4 is adapted from De Leve et al. (1977); see also Tijms
(1980). Embedding is especially useful for developing a tailor-made policy-iteration
algorithm that operates on a subclass of structured policies. The heuristic approach
of attacking a multidimensional Markov decision problem through decomposition
and a single-improvement step goes back to Norman (1970) and has been success-
fully applied in Krishnan and Ott (1986,1987) and Wijngaard (1979), among others.
The heuristic solution for the dynamic routing problem from Example 7.5.1 comes
from Krishnan and Ott (1987) and has been extended in Sassen et al. (1997) to
the case of general service times. Other heuristic approaches to handle large-scale
Markov decision processes are discussed in Cooper et al. (2003) and Schweitzer
and Seidman (1985).