Page 289 - A First Course In Stochastic Models
P. 289

THE SEMI-MARKOV DECISION MODEL                 283


                            (τ/τ i (a))p ij (a),          j  = i, a ∈ A(i) and i ∈ I,
                   p (a) =
                     ij
                            (τ/τ i (a))p ij (a) + [1 − (τ/τ i (a))], j = i, a ∈ A(i) and i ∈ I.
                This discrete-time Markov decision model has the same class of stationary policies
                as the original semi-Markov decision model. For each stationary policy R, let
                g (R) denote the long-run average cost per time unit in the discrete-time model
                 i
                when policy R is used and the initial state is i. Then it holds for each stationary
                policy R that
                                       g i (R) = g (R),  i ∈ I.              (7.1.3)
                                               i
                This result does not require any assumption about the chain structure of the Markov
                chains associated with the stationary policies. However, we prove the result (7.1.3)
                only for the unichain case. Fix a stationary policy R and assume that the embedded
                Markov chain {X n } in the semi-Markov model has no two disjoint closed sets.
                Denote by X n the state at the nth decision epoch in the transformed discrete-
                time model. It is directly seen that the Markov chain {X n } is also unichain under
                policy R. The equilibrium probabilities π j (R) of the Markov chain {X n } satisfy
                the equilibrium equations


                      π j (R) =  π i (R)p (R i )
                                       ij
                              i∈I
                                        τ                 τ

                           =    π i (R)    p ij (R i ) + 1 −   π j (R),  j ∈ I.
                                      τ i (R i )        τ j (R j )
                              i∈I
                Hence, letting u j = π j (R)/τ j (R j ) and dividing by τ, we find that

                                     u j =   u i p ij (R i ),  j ∈ I.
                                          i∈I
                These equations are precisely the equilibrium equations for the equilibrium prob-
                abilities π j (R) of the embedded Markov chain {X n } in the semi-Markov model.
                The equations determine the π j (R) uniquely up to a multiplicative constant. Thus,
                for some constant γ > 0,
                                               π j (R)
                                      π j (R) = γ   ,   j ∈ I.
                                               τ j (R j )

                Since  j∈I  π j (R) = 1, it follows that γ =  j∈I j (R j )π j (R). The desired result
                                                         τ
                (7.1.3) now follows easily. We have
                                                   1     c j (R j )
                           g(R) =    c j (R j )π j (R) =     π j (R)τ j (R j )
                                                  γ     τ j (R j )
                                  j∈I               j∈I

                                =    c j (R j )π j (R)/  τ j (R j )π j (R)
                                  j∈I            j∈I
   284   285   286   287   288   289   290   291   292   293   294