Page 273 - A First Course In Stochastic Models
P. 273
266 DISCRETE-TIME MARKOV DECISION PROCESSES
V n (N) = C f + V n−1 (N + 1),
V n (N + 1) = 0 + V n−1 (1).
We have applied the standard value-iteration algorithm to the numerical data
from Table 6.4.1. For each stationary policy the associated Markov chain {X n } is
aperiodic. Taking V 0 (i) = 0 for all i and the accuracy number ε = 10 −3 , the
algorithm is stopped after n = 28 iterations with the stationary policy R(n) =
(0, 0, 0, 1, 2, 2) together with the lower and upper lower bounds m n = 0.4336 and
M n = 0.4340. The average cost of policy R(n) is estimated by 1 (m n + M n ) =
2
0.4338 and this cost cannot deviate more than 0.1% from the theoretically minimal
average cost. In fact policy R(n) is optimal as we know from previous results
obtained by policy iteration. To get a feeling of how strongly the required number
of iterations depends on ε, we applied standard value-iteration for ε = 10 −2 and
ε = 10 −4 as well. For these choices of the accuracy number ε, standard value-
iteration required 21 and 35 iterations respectively.
Example 6.6.1 A finite-capacity queue with deterministic arrivals
Consider a single-server queueing system having a finite waiting room for K cus-
tomers (including any customer in service). The arrival process of customers is
deterministic. Every D time units a customer arrives. A customer finding a full
waiting room upon arrival is lost. The service times of the customers are indepen-
dent random variables having an Erlang (r, µ) distribution. What is the long-run
fraction of customers who are lost?
Taking the constant interarrival time as time unit, the fraction of lost customers
can be seen as an average cost per time unit when a cost of 1 is incurred each time
an arriving customer finds the waiting room full. The queueing process embedded at
the arrival epochs can be described by a Markov process by noting that the Erlang
(r, µ) distributed service time can be seen as the sum of r independent phases
each having an exponential distribution with mean 1/µ. A customer is served by
serving its phases one at a time. The queueing problem can now be converted into
a Markov decision model with a single action in each state. The state of the system
is observed at the arrival epochs and the set of possible states of the system is
given by
I = {0, 1, . . . , Kr}.
State i corresponds to the situation that i uncompleted service phases are present
just prior to the arrival of a new customer. In each state i there is a single action
to be denoted by a = 0. The action a = 0 in state i corresponds to the acceptance
of the newly arriving customer when i ≤ Kr − r and corresponds to the rejection
of the customer otherwise. The one-step costs c i (a) are given by
0 if i ≤ Kr − r,
c i (a) =
1 if i > Kr − r.