Page 282 - A First Course In Stochastic Models
P. 282

BIBLIOGRAPHIC NOTES                      275

                Develop a value-iteration algorithm to compute an average cost optimal claim rule. (Hint:
                take the beginning of each month as the decision epochs and let the action a = d mean
                that damage in the coming month will be claimed only if this damage exceeds the level d.
                Define the state of the system as (t, i) with t = 0, . . . , 12 and i = 1, . . . , 6, where t denotes
                the number of months until the next premium payment and the indicator variable i refers
                to the status of the next premium payment. Explain why you need no data transformation
                to handle periodicities but you can use the bounds m n = min i {V 12n (0, i) − V 12(n−1) (0, i)}
                and M n = max i {V 12n (0, i) − V 12(n−1) (0, i)}, where V 12n+t (t, i) is defined as the minimal
                total expected cost if the motorist still has an insurance contract for t + 12n months and the
                present state is (t, i).)
                6.10 The stock level of a given product is reviewed at the beginning of each week. Upon
                review a decision has to be made whether or not to replenish the stock level. The stock can
                be replenished up to level M, the maximum amount that can be held in stock. The lead time
                of a replenishment order is negligible. The demands for the product in the successive weeks
                are independent random variables having a Poisson distribution with a given mean µ. Any
                demand occurring when the system is out of stock is lost. The following costs are involved.
                For each replenishment order there is a fixed set-up cost of K > 0 and a variable ordering
                cost of c ≥ 0 for each unit ordered. In each week a holding cost of h > 0 is charged against
                each unit in stock at the end of the week. A penalty cost of p > 0 is incurred for each unit
                of demand that is lost. The problem is to find a stock control rule minimizing the long-run
                average cost per week.
                  (a) Use value iteration to solve for the numerical data M = 100, µ = 25, K = 64,
                c = 0, h = 1 and p = 5. Also try other numerical examples and verify experimentally that
                the optimal control rule is always of the (s, S) type when the maximum stock level M is
                sufficiently large. Under an (s, S) policy with s ≤ S the inventory position is ordered up to
                the level S when at a review the inventory position is below the reorder point s; otherwise, no
                ordering is done. Using the flexibility in the policy-improvement procedure, Federgruen and
                Zipkin (1984) developed a tailor-made policy-iteration algorithm that generates a sequence
                of improved policies within the class of (s, S) policies.
                  (b) Suppose the probabilistic constraint is imposed that the long-run fraction of demand
                lost should not exceed 1 − β for a prespecified service level β (note that this fraction
                equals the average demand lost per week divided by µ). Use linear programming to find an
                optimal control minimizing the long-run average cost per week subject to this service level
                constraint. Solve for the numerical data β = 0.99, M = 100, µ = 25, K = 64, c = 0, h = 1
                and p = 0. Also compare the average cost and the service level of the optimal randomized
                policy with the average cost and the service level of the best stationary policy obtained by
                the Lagrange-multiplier approach.





                                    BIBLIOGRAPHIC NOTES
                The policy-iteration method for the discrete-time Markov decision model was
                developed in Howard (1960). A theoretical foundation to Howard’s policy-iteration
                method was given in Blackwell (1962); see also Denardo and Fox (1968) and
                Veinott (1966). Linear programming formulations for the Markov decision model
                were first given by De Ghellinck (1960) and Manne (1960) and streamlined later
                by Denardo and Fox (1968), Derman (1970) and Hordijk and Kallenberg (1979,
                1984). The computational usefulness of the value-iteration algorithm was greatly
                enlarged by Odoni (1969) and Hastings (1971), who introduced lower and upper
   277   278   279   280   281   282   283   284   285   286   287