Page 273 - A First Course In Stochastic Models
P. 273

266             DISCRETE-TIME MARKOV DECISION PROCESSES

                       V n (N) = C f + V n−1 (N + 1),
                   V n (N + 1) = 0 + V n−1 (1).

                 We have applied the standard value-iteration algorithm to the numerical data
                from Table 6.4.1. For each stationary policy the associated Markov chain {X n } is
                aperiodic. Taking V 0 (i) = 0 for all i and the accuracy number ε = 10 −3 , the
                algorithm is stopped after n = 28 iterations with the stationary policy R(n) =
                (0, 0, 0, 1, 2, 2) together with the lower and upper lower bounds m n = 0.4336 and
                M n = 0.4340. The average cost of policy R(n) is estimated by  1 (m n + M n ) =
                                                                      2
                0.4338 and this cost cannot deviate more than 0.1% from the theoretically minimal
                average cost. In fact policy R(n) is optimal as we know from previous results
                obtained by policy iteration. To get a feeling of how strongly the required number
                of iterations depends on ε, we applied standard value-iteration for ε = 10 −2  and
                ε = 10 −4  as well. For these choices of the accuracy number ε, standard value-
                iteration required 21 and 35 iterations respectively.


                Example 6.6.1 A finite-capacity queue with deterministic arrivals

                Consider a single-server queueing system having a finite waiting room for K cus-
                tomers (including any customer in service). The arrival process of customers is
                deterministic. Every D time units a customer arrives. A customer finding a full
                waiting room upon arrival is lost. The service times of the customers are indepen-
                dent random variables having an Erlang (r, µ) distribution. What is the long-run
                fraction of customers who are lost?
                  Taking the constant interarrival time as time unit, the fraction of lost customers
                can be seen as an average cost per time unit when a cost of 1 is incurred each time
                an arriving customer finds the waiting room full. The queueing process embedded at
                the arrival epochs can be described by a Markov process by noting that the Erlang
                (r, µ) distributed service time can be seen as the sum of r independent phases
                each having an exponential distribution with mean 1/µ. A customer is served by
                serving its phases one at a time. The queueing problem can now be converted into
                a Markov decision model with a single action in each state. The state of the system
                is observed at the arrival epochs and the set of possible states of the system is
                given by
                                         I = {0, 1, . . . , Kr}.

                State i corresponds to the situation that i uncompleted service phases are present
                just prior to the arrival of a new customer. In each state i there is a single action
                to be denoted by a = 0. The action a = 0 in state i corresponds to the acceptance
                of the newly arriving customer when i ≤ Kr − r and corresponds to the rejection
                of the customer otherwise. The one-step costs c i (a) are given by

                                             0    if i ≤ Kr − r,
                                    c i (a) =
                                             1    if i > Kr − r.
   268   269   270   271   272   273   274   275   276   277   278