Page 184 - A First Course In Stochastic Models
P. 184
TRANSIENT DISTRIBUTION OF CUMULATIVE REWARDS 177
reason we prefer to present a simple-minded discretization approach for the general
reward case. For fixed t > 0, let
R(t) = the cumulative reward earned up to time t.
Assume that for each state j ∈ I the joint probability distribution function P {R(t) ≤
x, X(t) = j} has a density with respect to the reward variable x (a sufficient
condition is that r(j) > 0 for all j ∈ I). Then we can represent P {R(t) ≤ x} as
x
P {R(t) ≤ x} = f j (t, y) dy, x ≥ 0,
0
j∈I
where f j (t, x) is the joint probability density of the cumulative reward up to time
t and the state of the process at time t. The idea is to discretize the reward variable
x and the time variable t in multiples of , where > 0 is chosen sufficiently
small (the probability of more than one state transition in a time period of length
should be negligibly small). The discretized reward variable x can be restricted
to multiples of when the following assumptions are made:
(a) the reward rates r(j) are non-negative integers,
(b) the non-negative lump rates F jk are multiples of .
For practical applications it is no restriction to make these assumptions. How do
we compute P {R(t) ≤ x} for fixed t and x? It is convenient to assume a probability
distribution
α i = P {X(0) = i}, i ∈ I
for the initial state of the process. In view of the probabilistic interpretation
f j (t, x) x ≈ P {x ≤ R(t) < x + x, X(t) = j} for x small,
we approximate for fixed > 0 the density f j (u, y) by a discretized function
f (τ, r). The discretized variables τ and r run through multiples of . For fixed
j
> 0 the discretized functions f (τ, r) are defined by the recursion scheme
j
f (τ, r) = f (τ − , r − r(j) )(1 − ν j )
j
j
+ f (τ − , r − r(k) − F kj )q kj
k
k =j
for τ = 0, , . . . , (t/ ) and r = 0, , . . . , (x/ ) (for ease assume that x
and t are multiples of ). For any j ∈ I, the boundary conditions are
α j / , r = 0,
f (0, r) =
j 0, otherwise,