Page 296 - A First Course In Stochastic Models

P. 296

290 SEMI-MARKOV DECISION PROCESSES

provided we put V n−1 (i 1 , i 2 , 1) = V n−1 (i 1 , i 2 , 2) = ∞ when i 1 +i 2 = c+1 in order
to exclude the unfeasible decision a = 1 in the states (i 1 , i 2 , 1) with i 1 + i 2 = c. A
similar expression applies to V n (i 1 , i 2 , 2). This completes the speciﬁcation of the
recursion step of the value-iteration algorithm. The other steps of the algorithm go
without saying.
The value-iteration algorithm for the semi-Markov decision formulation with
ﬁctitious decision epochs requires the extra states (i 1 , i 2 , 0). However, the number
2
of additions and multiplications per iteration is of the order c rather than of the
4
order c as in a straightforward semi-Markov decision formulation. It appears from
numerical experiments that there is a considerable overall reduction in computa-
tional effort when using the formulation with ﬁctitious decision epochs. A further
reduction in the computing times can be achieved by applying modiﬁed value
iteration rather than standard value iteration; see Section 6.6.
Numerical investigations indicate that the overall optimal control rule has an
intuitively appealing structure. It is characterized by integers L 0 , L 1 , . . . , L c−1
with L 0 ≥ L 1 ≥ · · · ≥ L c−1 . One type of message (call it the priority type) is
always accepted as long as not all transmission channels are occupied. An arriving
message of the non-priority type ﬁnding i priority type messages present upon
arrival is only accepted when less than L i non-priority type messages are present
and not all channels are occupied. In the numerical example with c = 10, λ 1 = 10,
λ 2 = 7, µ 1 = 10 and µ 2 = 1, the optimal L i -values are given by
L 0 = L 1 = 8, L 2 = L 3 = 7, L 4 = 6, L 5 = 5, L 6 = 4,
L 7 = 3, L 8 = 2, L 9 = 1.

The minimal average loss rate is 1.767. A challenging open problem is to ﬁnd
a theoretical proof that an overall optimal policy has the L i -structure. Another
empirical result that deserves further investigation is the ﬁnding that the average
loss rate under an L i -policy is nearly insensitive to the form of the probability
distributions of the transmission times; see also the discussion in Example 5.4.2.

7.4 OPTIMIZATION OF QUEUES

The semi-Markov model is a natural and powerful tool for the optimization of
queues. Many queueing problems in telecommunication ask for the computation
of an optimal control rule for a given performance measure. If the control rule is
determined by one or two parameters, one might ﬁrst use Markov chain analysis
to calculate the performance measure for given values of the control parameters
and next use a standard optimization procedure to ﬁnd the optimal values of the
control parameters. However, this is not always the most effective approach. Below
we give an example of a controlled queueing system for which the semi-Markov
decision approach is not only more elegant, but is also more effective than a direct
search procedure. In this application the number of states is unbounded. However,
by exploiting the structure of the problem, we are able to cast the problem into

291 292 293 294 295 296 297 298 299 300 301