Page 293 - A First Course In Stochastic Models
P. 293

VALUE ITERATION AND FICTITIOUS DECISIONS          287

                The algorithm for deriving an optimal stationary policy from the LP solution is the
                same as in Section 6.5. In the same way as in Section 6.5 the linear programming
                formulation can be extended to cover probabilistic constraints such as the fraction
                of time that the system is in some subset I 0 of states should not exceed α. In
                the situation of probabilistic constraints, the average cost optimal policy usually
                involves randomized decisions.


                    7.3   VALUE ITERATION AND FICTITIOUS DECISIONS

                The value-iteration method is often the most preferred method to compute a (nearly)
                average cost optimal policy. In each iteration of the method the lower and upper
                bounds indicate how much the average cost of the current policy deviates from
                the minimal average cost. The computational burden of the value-iteration algo-
                rithm depends not only on the number of states, but also on the density of the
                non-zero transition probabilities p ij (a). By the very nature of the value-iteration
                algorithm, it is computationally burdensome to have many non-zero p ij (a). In
                applications with exponentially distributed times between the decision epochs, the
                computational effort of the value-iteration algorithm can often be considerably
                reduced by including so-called fictitious decision epochs. The state of the system
                is left unchanged at the fictitious decision epochs. The inclusion of fictitious deci-
                sion epochs does not change the Markovian nature of the decision process, since
                the times between state transitions are exponentially distributed and thus have
                the memoryless property. The trick of fictitious decision epochs reduces not only
                the computational effort, but also simplifies the formulation of the value-iteration
                algorithm. The inclusion of fictitious decision epochs has as a consequence that the
                state space must be enlarged with an indicator variable to distinguish between the
                fictitious decision epochs and the real decision epochs. However, the greater sim-
                plicity in formulation and the reduction in computing times outweigh the enlarged
                state space.


                Example 7.3.1 Optimal allocation of servers to competing customers
                In communication networks an important problem is the allocation of servers to
                competing customer classes. Suppose messages of the types 1 and 2 arrive at a
                communication system according to independent Poisson processes with respective
                rates λ 1 and λ 2 . The communication system has c identical transmission channels
                for handling the messages, where each channel can handle only one message at a
                time. The system has no buffer for storing temporarily messages that find all chan-
                nels occupied upon arrival. Such messages have to be rejected anyway. However,
                a newly arriving message may also be rejected when there is a free channel. The
                goal is to find a control rule that minimizes the average rejection rate or, equiva-
                lently, maximizes the average throughput of accepted messages. In Example 5.4.2
                the best control rule was determined within the subclass of L-policies. Markov
                decision theory enables us to compute an overall optimal policy. To do so, we
   288   289   290   291   292   293   294   295   296   297   298