Page 111 - A First Course In Stochastic Models
P. 111

THE EQUILIBRIUM PROBABILITIES                103

                on the interval (0, 1). Using the condition (3.3.12), it is readily verified that the
                equation (3.3.15) has a unique solution on (0, 1). The result (3.3.14) can be proved
                                                                           j
                in several ways. A direct way is to try a solution of the form π j = γ η , j ≥ 0
                for constants γ > 0 and 0 < η < 1 and substituting this form into (3.3.13).
                By doing so, one then finds that η satisfies the equation (3.3.15). The constant γ
                                π
                follows from  
 j=0 j = 1. More sophisticated proofs of result (3.3.14) are given
                             ∞
                in Sections 3.4.2 and 3.5.2.
                3.3.3 The Long-run Average Reward per Time Unit

                A very useful applied probability model is the Markov chain model on which a
                reward or cost structure is imposed. Suppose that a reward f (j) is earned each
                time the Markov chain visits state j for j ∈ I. The ergodic theorem shows how
                to compute the long-run average reward per time unit in terms of the equilibrium
                probabilities π j . In addition to Assumption 3.3.1 involving the regeneration state
                r, we need the following assumption.

                Assumption 3.3.2 (a) The total reward earned between two visits of the Markov
                chain to state r has a finite expectation and  
  |f (j)| π j < ∞.
                                                      j∈I
                  (b) For each initial state X 0 = i with i  = r, the total reward earned until the
                first visit of the Markov chain to state r is finite with probability 1.

                  This assumption is automatically satisfied when the Markov chain has a finite
                state space and satisfies Assumption 3.3.1.

                Theorem 3.3.3 Suppose the Markov chain {X n } satisfies Assumptions 3.3.1 and
                3.3.2. Then the long-run average reward per time unit is

                                   n
                                 1
                             lim     f (X k ) =  f (j)π j  with probability 1
                            n→∞ n
                                  k=1        j∈I
                for each initial state X 0 = i.
                  Intuitively this theorem is obvious by noting that the long-run average number
                of visits to state j per time unit equals π j with probability 1 for each state j ∈ I.
                A formal proof of Theorem 3.3.3 is given in Section 3.5.2.

                Remark 3.3.1 A useful modification of Theorem 3.3.3
                In Theorem 3.3.3 the renewal function refers to an immediate reward f (j) that is
                earned each time the Markov chain visits state j. However, in practical applications
                it happens often that rewards are gradually earned during the time between the state
                transitions of the Markov chain. Define for those situations the reward function
                f (j) by

                       f (j) = the expected reward earned until the next state transition
                             when a state transition has just occurred to state j.
   106   107   108   109   110   111   112   113   114   115   116