Page 303 - A First Course In Stochastic Models
P. 303
ONE-STEP POLICY IMPROVEMENT 297
derive an improved policy from the Bernoulli-splitting rule R (0) . This derivation
is based on first principles discussed in Section 6.2. The basic idea of the policy-
improvement step is to minimize for each state x the difference (x, a, R (0) )
defined by
(x, a, R (0) ) = the difference in total expected costs over an infinitely
long period of time by taking first action a and next using
policy R (0) rather than using policy R (0) from scratch
when the initial state is x.
The difference is well defined since the Markov chain associated with policy R (0)
is aperiodic. Under the Bernoulli-splitting rule the n queues act as independent
M/M/s queues. Define for each separate queue j,
D j (i) = the difference in total expected costs in queue j over
an infinitely long period of time by starting with i + 1
customers in queue j rather than with i customers.
Then, for each state x = (i 1 , . . . , i n ) and action a = k,
n
(0) (0) (0)
(x, a, R ) = p [−D j (i j ) + D k (i k )] + p × 0
j k
j=1
j =k
n
(0)
= − p D j (i j ) + D k (i k ).
j
j=1
(0)
Since the term p D j (i j ) does not depend on the action a = k, the step of
j j
minimizing (x, k, R (0) ) over k reduces to the computation of
min {D k (i k )}.
1≤k≤n
Hence a remarkably simple expression is evaluated in the policy-improvement
step applied to the Bernoulli-splitting rule. The suboptimal rule resulting from the
single application of the policy-improvement step is called the separable rule. The
performance of this rule will be discussed below.
It remains to specify the function D k (i), i = 0, 1, . . . , for each queue k. To do
so, consider an M/M/s queue in isolation, where customers arrive according to a
Poisson process with rate α and there are s exponential servers each with service
rate µ. Each arrival is admitted to the queue. The state of the system describes
the number of customers present. A cost at rate j is incurred when there are j
customers present. The long-run average cost per time unit is given by
g = L(s, α, µ).