Page 301 - A First Course In Stochastic Models
P. 301

ONE-STEP POLICY IMPROVEMENT                  295

                174, 311, 226 and 250 iterations. Modified value iteration required 59, 82, 87 and
                71 iterations and ended up with the respective bounds (m n , M n ) = (319.3, 319.5),
                (367.1, 367.4), (331.5, 331.8) and (378.0, 378.3) on the minimal average cost.


                            7.5  ONE-STEP POLICY IMPROVEMENT

                The policy-iteration algorithm has the remarkable feature that it achieves the largest
                improvements in costs in the first few iterations. These findings underlie a heuristic
                approach for Markov decision problems with a multidimensional state space. In
                such decision problems it is usually not feasible to solve the value-determination
                equations. However, a policy-improvement step offers in general no computational
                difficulties. This suggests a heuristic approach that determines first a good estimate
                for the relative values and next applies a single policy-improvement step. By the
                nature of the policy-iteration algorithm one might expect to obtain a good decision
                rule by the heuristic approach. How to compute the relative values to be used
                in the policy-improvement step typically depends on the specific application. The
                heuristic approach is illustrated in the next example.

                Example 7.5.1 Dynamic routing of customers to parallel queues

                An important queueing model arising in various practical situations is one in which
                arriving customers (messages or jobs) have to be assigned to one of several different
                groups of servers. Problems of this type occur in telecommunication networks and
                flexible manufacturing. The queueing system consists of n multi-server groups
                working in parallel, where each group has its own queue. There are s k servers in
                group k (k = 1, . . . , n). Customers arrive according to a Poisson process with rate
                λ. Upon arrival each customer has to be assigned to one of the n server groups.
                The assignment is irrevocable. The customer waits in the assigned queue until a
                server becomes available. Each server can handle only one customer at a time.
                  The problem is to find an assignment rule that (nearly) minimizes the average
                sojourn time per customer. This problem will be analysed under the assumption that
                the service times of the customers are independent and exponentially distributed.
                The mean service time of a customer assigned to queue k is 1/µ k (k = 1, . . . , n).
                                     n
                                        s
                It is assumed that λ <  k=1 k µ k . In what follows we consider the minimization
                of the overall average number of customers in the system. In view of Little’s
                formula, the minimization of the average sojourn time per customer is equivalent
                to the minimization of the average number of customers in the system.

                Bernoulli-splitting rule
                An intuitively appealing control rule is the shortest-queue rule. Under this rule
                each arriving customer is assigned to the shortest queue. Except for the special
                case of s 1 = · · · = s n and µ 1 = · · · = µ n , this rule is in general not optimal. In
                particular, the shortest-queue rule may perform quite unsatisfactorily in the situation
   296   297   298   299   300   301   302   303   304   305   306