Page 301 - A First Course In Stochastic Models
P. 301
ONE-STEP POLICY IMPROVEMENT 295
174, 311, 226 and 250 iterations. Modified value iteration required 59, 82, 87 and
71 iterations and ended up with the respective bounds (m n , M n ) = (319.3, 319.5),
(367.1, 367.4), (331.5, 331.8) and (378.0, 378.3) on the minimal average cost.
7.5 ONE-STEP POLICY IMPROVEMENT
The policy-iteration algorithm has the remarkable feature that it achieves the largest
improvements in costs in the first few iterations. These findings underlie a heuristic
approach for Markov decision problems with a multidimensional state space. In
such decision problems it is usually not feasible to solve the value-determination
equations. However, a policy-improvement step offers in general no computational
difficulties. This suggests a heuristic approach that determines first a good estimate
for the relative values and next applies a single policy-improvement step. By the
nature of the policy-iteration algorithm one might expect to obtain a good decision
rule by the heuristic approach. How to compute the relative values to be used
in the policy-improvement step typically depends on the specific application. The
heuristic approach is illustrated in the next example.
Example 7.5.1 Dynamic routing of customers to parallel queues
An important queueing model arising in various practical situations is one in which
arriving customers (messages or jobs) have to be assigned to one of several different
groups of servers. Problems of this type occur in telecommunication networks and
flexible manufacturing. The queueing system consists of n multi-server groups
working in parallel, where each group has its own queue. There are s k servers in
group k (k = 1, . . . , n). Customers arrive according to a Poisson process with rate
λ. Upon arrival each customer has to be assigned to one of the n server groups.
The assignment is irrevocable. The customer waits in the assigned queue until a
server becomes available. Each server can handle only one customer at a time.
The problem is to find an assignment rule that (nearly) minimizes the average
sojourn time per customer. This problem will be analysed under the assumption that
the service times of the customers are independent and exponentially distributed.
The mean service time of a customer assigned to queue k is 1/µ k (k = 1, . . . , n).
n
s
It is assumed that λ < k=1 k µ k . In what follows we consider the minimization
of the overall average number of customers in the system. In view of Little’s
formula, the minimization of the average sojourn time per customer is equivalent
to the minimization of the average number of customers in the system.
Bernoulli-splitting rule
An intuitively appealing control rule is the shortest-queue rule. Under this rule
each arriving customer is assigned to the shortest queue. Except for the special
case of s 1 = · · · = s n and µ 1 = · · · = µ n , this rule is in general not optimal. In
particular, the shortest-queue rule may perform quite unsatisfactorily in the situation