Page 306 - A First Course In Stochastic Models
P. 306

300                 SEMI-MARKOV DECISION PROCESSES

                                           EXERCISES
                7.1 Consider a production facility that operates only intermittently to manufacture a single
                product. The production will be stopped if the inventory is sufficiently high, whereas the
                production will be restarted when the inventory has dropped sufficiently low. Customers
                asking for the product arrive according to a Poisson process with rate λ. The demand of
                each customer is for one unit. Demand which cannot be satisfied directly from stock on
                hand is lost. Also, a finite capacity C for the inventory is assumed. In a production run, any
                desired lot size can be produced. The production time of a lot size of Q units is a random
                variable T Q having a probability density f Q (t). The lot size is added to the inventory at the
                end of the production run. After the completion of a production run, a new production run is
                started or the facility is closed down. At each point of time the production can be restarted.
                The production costs for a lot size of Q ≥ 1 units consist of a fixed set-up cost K > 0 and
                a variable cost c per unit produced. Also, there is a holding cost of h > 0 per unit kept in
                stock per time unit, and a lost-sales cost of p > 0 is incurred for each lost demand. The
                goal is to minimize the long-run average cost per time unit. Formulate the problem as a
                semi-Markov decision model.
                7.2 Consider the maintenance problem from Example 6.1.1 again. The numerical data are
                given in Table 6.4.1. Assume now that a repair upon failure takes either 1, 2 or 3 days,
                each with probability 1/3. Use the semi-Markov model to compute by policy iteration or
                linear programming an average cost optimal policy. Can you explain why you get the same
                optimal policy as in Example 6.1.1?
                7.3 A cargo liner operates between the five harbours A 1 , . . . , A 5 . A cargo shipment from
                harbour A i to harbour A j (j  = i) takes a random number τ ij of days (including load and
                discharge) and yields a random pay-off of ξ ij . The shipment times τ ij and the pay-offs ξ ij
                are normally distributed with means µ(τ ij ) and µ(ξ ij ) and standard deviations σ(τ ij ) and
                σ(ξ ij ). We assume the numerical data:

                                               µ(τ ij )[σ(τ ij )]
                                i\j   1      2     3      4     5

                                                          1
                                             1
                                                                 1
                                 1    -    3      6 [1]  3     2
                                             2            2      2

                                 2   4 [1]   -    1  1   7 [1]  5 [1]
                                                    4

                                 3   5 [1]  1  1   -     6 [1]  8 [1]
                                             4

                                 4   3  1   8 [1]  5 [1]  -    2  1
                                       2                         2
                                                          1

                                       1
                                 5   2      5 [1]  9 [1]  2     -
                                       2                  2
                                               µ(ξ ij )[σ(ξ ij )]
                                i\j   1      2     3      4     5
                                1     -    8 [1]  12 [2]  6 [1]  6 [1]

                                                    1
                                2   20 [3]   -    2     14 [3]  16 [2]
                                                    2

                                3   16 [3]  2  1   -    18 [3]  16 [1]
                                             2

                                4    6 [1]  10 [2]  20 [2]  -  6  1
                                                                 2
                                5    8 [2]  16 [3]  20 [2]  8 [1]  -
                Compute by policy iteration or linear programming a sailing route for which the long-run
                average reward per day is maximal.
   301   302   303   304   305   306   307   308   309   310   311