Page 303 - A First Course In Stochastic Models
P. 303

ONE-STEP POLICY IMPROVEMENT                  297

                derive an improved policy from the Bernoulli-splitting rule R (0) . This derivation
                is based on first principles discussed in Section 6.2. The basic idea of the policy-
                improvement step is to minimize for each state x the difference  (x, a, R (0) )
                defined by

                    (x, a, R (0) ) = the difference in total expected costs over an infinitely
                                long period of time by taking first action a and next using
                                policy R (0)  rather than using policy R (0)  from scratch
                                when the initial state is x.

                The difference is well defined since the Markov chain associated with policy R (0)
                is aperiodic. Under the Bernoulli-splitting rule the n queues act as independent
                M/M/s queues. Define for each separate queue j,

                       D j (i) = the difference in total expected costs in queue j over
                               an infinitely long period of time by starting with i + 1
                               customers in queue j rather than with i customers.

                Then, for each state x = (i 1 , . . . , i n ) and action a = k,
                                         n
                                   (0)       (0)                  (0)
                            (x, a, R  ) =  p   [−D j (i j ) + D k (i k )] + p  × 0
                                            j                     k
                                        j=1
                                        j =k
                                           n
                                              (0)
                                      = −    p  D j (i j ) + D k (i k ).
                                              j
                                          j=1
                                 (0)
                Since the term  p  D j (i j ) does not depend on the action a = k, the step of
                               j  j
                minimizing  (x, k, R (0) ) over k reduces to the computation of
                                            min {D k (i k )}.
                                           1≤k≤n
                Hence a remarkably simple expression is evaluated in the policy-improvement
                step applied to the Bernoulli-splitting rule. The suboptimal rule resulting from the
                single application of the policy-improvement step is called the separable rule. The
                performance of this rule will be discussed below.
                  It remains to specify the function D k (i), i = 0, 1, . . . , for each queue k. To do
                so, consider an M/M/s queue in isolation, where customers arrive according to a
                Poisson process with rate α and there are s exponential servers each with service
                rate µ. Each arrival is admitted to the queue. The state of the system describes
                the number of customers present. A cost at rate j is incurred when there are j
                customers present. The long-run average cost per time unit is given by

                                           g = L(s, α, µ).
   298   299   300   301   302   303   304   305   306   307   308