Page 293 - A First Course In Stochastic Models
P. 293
VALUE ITERATION AND FICTITIOUS DECISIONS 287
The algorithm for deriving an optimal stationary policy from the LP solution is the
same as in Section 6.5. In the same way as in Section 6.5 the linear programming
formulation can be extended to cover probabilistic constraints such as the fraction
of time that the system is in some subset I 0 of states should not exceed α. In
the situation of probabilistic constraints, the average cost optimal policy usually
involves randomized decisions.
7.3 VALUE ITERATION AND FICTITIOUS DECISIONS
The value-iteration method is often the most preferred method to compute a (nearly)
average cost optimal policy. In each iteration of the method the lower and upper
bounds indicate how much the average cost of the current policy deviates from
the minimal average cost. The computational burden of the value-iteration algo-
rithm depends not only on the number of states, but also on the density of the
non-zero transition probabilities p ij (a). By the very nature of the value-iteration
algorithm, it is computationally burdensome to have many non-zero p ij (a). In
applications with exponentially distributed times between the decision epochs, the
computational effort of the value-iteration algorithm can often be considerably
reduced by including so-called fictitious decision epochs. The state of the system
is left unchanged at the fictitious decision epochs. The inclusion of fictitious deci-
sion epochs does not change the Markovian nature of the decision process, since
the times between state transitions are exponentially distributed and thus have
the memoryless property. The trick of fictitious decision epochs reduces not only
the computational effort, but also simplifies the formulation of the value-iteration
algorithm. The inclusion of fictitious decision epochs has as a consequence that the
state space must be enlarged with an indicator variable to distinguish between the
fictitious decision epochs and the real decision epochs. However, the greater sim-
plicity in formulation and the reduction in computing times outweigh the enlarged
state space.
Example 7.3.1 Optimal allocation of servers to competing customers
In communication networks an important problem is the allocation of servers to
competing customer classes. Suppose messages of the types 1 and 2 arrive at a
communication system according to independent Poisson processes with respective
rates λ 1 and λ 2 . The communication system has c identical transmission channels
for handling the messages, where each channel can handle only one message at a
time. The system has no buffer for storing temporarily messages that find all chan-
nels occupied upon arrival. Such messages have to be rejected anyway. However,
a newly arriving message may also be rejected when there is a free channel. The
goal is to find a control rule that minimizes the average rejection rate or, equiva-
lently, maximizes the average throughput of accepted messages. In Example 5.4.2
the best control rule was determined within the subclass of L-policies. Markov
decision theory enables us to compute an overall optimal policy. To do so, we