Page 288 - A First Course In Stochastic Models
P. 288

282                 SEMI-MARKOV DECISION PROCESSES

                and
                                  1
                              lim   E(time over the first m decision epochs)
                             m→∞ m
                                             E(length of one cycle)
                                     =                                .
                                       E(number of transitions in one cycle)
                Together the above three relations yield (7.1.2). The remainder of the proof is
                simple. Obviously, we have
                                                          m−1
                                                                        (t)
                      E(costs over the first m decision epochs) =  c j (R j )p (R)
                                                                        rj
                                                           t=0 j∈I
                and
                                                          m−1
                                                                       (t)
                      E(time over the first m decision epochs) =  τ j (R j )p (R).
                                                                       rj
                                                          t=0 j∈I
                Dividing the numerator and the denominator of the right-hand side of (7.1.2) by
                                                        m−1  (t)
                m, letting m → ∞ and using lim m→∞ (1/m)  t=0  p (R) = π j (R), the result
                                                             rj
                (7.1.1) follows when the initial state X 0 = r. For initial state X 0 = i the result next
                follows by mimicking the proof of Theorem 3.5.11 and noting that state r will be
                reached from state i with probability 1 after finitely many transitions.
                  A stationary policy R is said to be average cost optimal if g i (R ) ≤ g i (R) for
                                   ∗
                                                                       ∗
                all i ∈ I and all stationary policies R. The algorithms for computing an average
                cost optimal policy in the discrete-time Markov decision model can be extended to
                the semi-Markov decision model. This will be done in the next section. However,
                before doing this, we discuss a data-transformation method that converts the semi-
                Markov decision model into a discrete-time Markov decision model such that for
                each stationary policy the average cost per time unit in the discrete-time Markov
                model is the same as in the semi-Markov model. This is a very useful result. The
                data-transformation method is an extension of the uniformization technique for
                continuous-time Markov chains discussed in Section 4.5.


                The data-transformation method
                First choose a number τ with

                                          0 < τ ≤ min τ i (a).
                                                  i,a
                Consider now the discrete-time Markov decision model whose basic elements are
                given by

                    I = I,  A(i) = A(i),       i ∈ I,
                    c i (a) = c i (a)/τ i (a),  a ∈ A(i) and i ∈ I,
   283   284   285   286   287   288   289   290   291   292   293