Page 111 - A First Course In Stochastic Models

P. 111

THE EQUILIBRIUM PROBABILITIES 103

on the interval (0, 1). Using the condition (3.3.12), it is readily veriﬁed that the
equation (3.3.15) has a unique solution on (0, 1). The result (3.3.14) can be proved
j
in several ways. A direct way is to try a solution of the form π j = γ η , j ≥ 0
for constants γ > 0 and 0 < η < 1 and substituting this form into (3.3.13).
By doing so, one then ﬁnds that η satisﬁes the equation (3.3.15). The constant γ
π
follows from
j=0 j = 1. More sophisticated proofs of result (3.3.14) are given
∞
in Sections 3.4.2 and 3.5.2.
3.3.3 The Long-run Average Reward per Time Unit

A very useful applied probability model is the Markov chain model on which a
reward or cost structure is imposed. Suppose that a reward f (j) is earned each
time the Markov chain visits state j for j ∈ I. The ergodic theorem shows how
to compute the long-run average reward per time unit in terms of the equilibrium
probabilities π j . In addition to Assumption 3.3.1 involving the regeneration state
r, we need the following assumption.

Assumption 3.3.2 (a) The total reward earned between two visits of the Markov
chain to state r has a ﬁnite expectation and
|f (j)| π j < ∞.
j∈I
(b) For each initial state X 0 = i with i = r, the total reward earned until the
ﬁrst visit of the Markov chain to state r is ﬁnite with probability 1.

This assumption is automatically satisﬁed when the Markov chain has a ﬁnite
state space and satisﬁes Assumption 3.3.1.

Theorem 3.3.3 Suppose the Markov chain {X n } satisﬁes Assumptions 3.3.1 and
3.3.2. Then the long-run average reward per time unit is

n
1
lim f (X k ) = f (j)π j with probability 1
n→∞ n
k=1 j∈I
for each initial state X 0 = i.
Intuitively this theorem is obvious by noting that the long-run average number
of visits to state j per time unit equals π j with probability 1 for each state j ∈ I.
A formal proof of Theorem 3.3.3 is given in Section 3.5.2.

Remark 3.3.1 A useful modiﬁcation of Theorem 3.3.3
In Theorem 3.3.3 the renewal function refers to an immediate reward f (j) that is
earned each time the Markov chain visits state j. However, in practical applications
it happens often that rewards are gradually earned during the time between the state
transitions of the Markov chain. Deﬁne for those situations the reward function
f (j) by

f (j) = the expected reward earned until the next state transition
when a state transition has just occurred to state j.

106 107 108 109 110 111 112 113 114 115 116