Page 111 - A First Course In Stochastic Models
P. 111
THE EQUILIBRIUM PROBABILITIES 103
on the interval (0, 1). Using the condition (3.3.12), it is readily verified that the
equation (3.3.15) has a unique solution on (0, 1). The result (3.3.14) can be proved
j
in several ways. A direct way is to try a solution of the form π j = γ η , j ≥ 0
for constants γ > 0 and 0 < η < 1 and substituting this form into (3.3.13).
By doing so, one then finds that η satisfies the equation (3.3.15). The constant γ
π
follows from
j=0 j = 1. More sophisticated proofs of result (3.3.14) are given
∞
in Sections 3.4.2 and 3.5.2.
3.3.3 The Long-run Average Reward per Time Unit
A very useful applied probability model is the Markov chain model on which a
reward or cost structure is imposed. Suppose that a reward f (j) is earned each
time the Markov chain visits state j for j ∈ I. The ergodic theorem shows how
to compute the long-run average reward per time unit in terms of the equilibrium
probabilities π j . In addition to Assumption 3.3.1 involving the regeneration state
r, we need the following assumption.
Assumption 3.3.2 (a) The total reward earned between two visits of the Markov
chain to state r has a finite expectation and
|f (j)| π j < ∞.
j∈I
(b) For each initial state X 0 = i with i = r, the total reward earned until the
first visit of the Markov chain to state r is finite with probability 1.
This assumption is automatically satisfied when the Markov chain has a finite
state space and satisfies Assumption 3.3.1.
Theorem 3.3.3 Suppose the Markov chain {X n } satisfies Assumptions 3.3.1 and
3.3.2. Then the long-run average reward per time unit is
n
1
lim f (X k ) = f (j)π j with probability 1
n→∞ n
k=1 j∈I
for each initial state X 0 = i.
Intuitively this theorem is obvious by noting that the long-run average number
of visits to state j per time unit equals π j with probability 1 for each state j ∈ I.
A formal proof of Theorem 3.3.3 is given in Section 3.5.2.
Remark 3.3.1 A useful modification of Theorem 3.3.3
In Theorem 3.3.3 the renewal function refers to an immediate reward f (j) that is
earned each time the Markov chain visits state j. However, in practical applications
it happens often that rewards are gradually earned during the time between the state
transitions of the Markov chain. Define for those situations the reward function
f (j) by
f (j) = the expected reward earned until the next state transition
when a state transition has just occurred to state j.