Page 296 - A First Course In Stochastic Models
P. 296

290                 SEMI-MARKOV DECISION PROCESSES

                provided we put V n−1 (i 1 , i 2 , 1) = V n−1 (i 1 , i 2 , 2) = ∞ when i 1 +i 2 = c+1 in order
                to exclude the unfeasible decision a = 1 in the states (i 1 , i 2 , 1) with i 1 + i 2 = c. A
                similar expression applies to V n (i 1 , i 2 , 2). This completes the specification of the
                recursion step of the value-iteration algorithm. The other steps of the algorithm go
                without saying.
                  The value-iteration algorithm for the semi-Markov decision formulation with
                fictitious decision epochs requires the extra states (i 1 , i 2 , 0). However, the number
                                                                   2
                of additions and multiplications per iteration is of the order c rather than of the
                     4
                order c as in a straightforward semi-Markov decision formulation. It appears from
                numerical experiments that there is a considerable overall reduction in computa-
                tional effort when using the formulation with fictitious decision epochs. A further
                reduction in the computing times can be achieved by applying modified value
                iteration rather than standard value iteration; see Section 6.6.
                  Numerical investigations indicate that the overall optimal control rule has an
                intuitively appealing structure. It is characterized by integers L 0 , L 1 , . . . , L c−1
                with L 0 ≥ L 1 ≥ · · · ≥ L c−1 . One type of message (call it the priority type) is
                always accepted as long as not all transmission channels are occupied. An arriving
                message of the non-priority type finding i priority type messages present upon
                arrival is only accepted when less than L i non-priority type messages are present
                and not all channels are occupied. In the numerical example with c = 10, λ 1 = 10,
                λ 2 = 7, µ 1 = 10 and µ 2 = 1, the optimal L i -values are given by
                           L 0 = L 1 = 8, L 2 = L 3 = 7, L 4 = 6, L 5 = 5, L 6 = 4,
                                  L 7 = 3, L 8 = 2, L 9 = 1.

                The minimal average loss rate is 1.767. A challenging open problem is to find
                a theoretical proof that an overall optimal policy has the L i -structure. Another
                empirical result that deserves further investigation is the finding that the average
                loss rate under an L i -policy is nearly insensitive to the form of the probability
                distributions of the transmission times; see also the discussion in Example 5.4.2.


                                7.4  OPTIMIZATION OF QUEUES

                The semi-Markov model is a natural and powerful tool for the optimization of
                queues. Many queueing problems in telecommunication ask for the computation
                of an optimal control rule for a given performance measure. If the control rule is
                determined by one or two parameters, one might first use Markov chain analysis
                to calculate the performance measure for given values of the control parameters
                and next use a standard optimization procedure to find the optimal values of the
                control parameters. However, this is not always the most effective approach. Below
                we give an example of a controlled queueing system for which the semi-Markov
                decision approach is not only more elegant, but is also more effective than a direct
                search procedure. In this application the number of states is unbounded. However,
                by exploiting the structure of the problem, we are able to cast the problem into
   291   292   293   294   295   296   297   298   299   300   301