Page 282 - A First Course In Stochastic Models
P. 282
BIBLIOGRAPHIC NOTES 275
Develop a value-iteration algorithm to compute an average cost optimal claim rule. (Hint:
take the beginning of each month as the decision epochs and let the action a = d mean
that damage in the coming month will be claimed only if this damage exceeds the level d.
Define the state of the system as (t, i) with t = 0, . . . , 12 and i = 1, . . . , 6, where t denotes
the number of months until the next premium payment and the indicator variable i refers
to the status of the next premium payment. Explain why you need no data transformation
to handle periodicities but you can use the bounds m n = min i {V 12n (0, i) − V 12(n−1) (0, i)}
and M n = max i {V 12n (0, i) − V 12(n−1) (0, i)}, where V 12n+t (t, i) is defined as the minimal
total expected cost if the motorist still has an insurance contract for t + 12n months and the
present state is (t, i).)
6.10 The stock level of a given product is reviewed at the beginning of each week. Upon
review a decision has to be made whether or not to replenish the stock level. The stock can
be replenished up to level M, the maximum amount that can be held in stock. The lead time
of a replenishment order is negligible. The demands for the product in the successive weeks
are independent random variables having a Poisson distribution with a given mean µ. Any
demand occurring when the system is out of stock is lost. The following costs are involved.
For each replenishment order there is a fixed set-up cost of K > 0 and a variable ordering
cost of c ≥ 0 for each unit ordered. In each week a holding cost of h > 0 is charged against
each unit in stock at the end of the week. A penalty cost of p > 0 is incurred for each unit
of demand that is lost. The problem is to find a stock control rule minimizing the long-run
average cost per week.
(a) Use value iteration to solve for the numerical data M = 100, µ = 25, K = 64,
c = 0, h = 1 and p = 5. Also try other numerical examples and verify experimentally that
the optimal control rule is always of the (s, S) type when the maximum stock level M is
sufficiently large. Under an (s, S) policy with s ≤ S the inventory position is ordered up to
the level S when at a review the inventory position is below the reorder point s; otherwise, no
ordering is done. Using the flexibility in the policy-improvement procedure, Federgruen and
Zipkin (1984) developed a tailor-made policy-iteration algorithm that generates a sequence
of improved policies within the class of (s, S) policies.
(b) Suppose the probabilistic constraint is imposed that the long-run fraction of demand
lost should not exceed 1 − β for a prespecified service level β (note that this fraction
equals the average demand lost per week divided by µ). Use linear programming to find an
optimal control minimizing the long-run average cost per week subject to this service level
constraint. Solve for the numerical data β = 0.99, M = 100, µ = 25, K = 64, c = 0, h = 1
and p = 0. Also compare the average cost and the service level of the optimal randomized
policy with the average cost and the service level of the best stationary policy obtained by
the Lagrange-multiplier approach.
BIBLIOGRAPHIC NOTES
The policy-iteration method for the discrete-time Markov decision model was
developed in Howard (1960). A theoretical foundation to Howard’s policy-iteration
method was given in Blackwell (1962); see also Denardo and Fox (1968) and
Veinott (1966). Linear programming formulations for the Markov decision model
were first given by De Ghellinck (1960) and Manne (1960) and streamlined later
by Denardo and Fox (1968), Derman (1970) and Hordijk and Kallenberg (1979,
1984). The computational usefulness of the value-iteration algorithm was greatly
enlarged by Odoni (1969) and Hastings (1971), who introduced lower and upper