Page 243 - A First Course In Stochastic Models
P. 243
236 DISCRETE-TIME MARKOV DECISION PROCESSES
i + 1. The equipment deteriorates in time. If the present working condition is i
and no repair is done, then at the beginning of the next day the equipment has
working condition j with probability q ij . It is assumed that q ij = 0 for j < i and
q ij = 1. The working condition i = N represents a malfunction that requires
j≥i
an enforced repair taking two days. For the intermediate states i with 1 < i < N
there is a choice between preventively repairing the equipment and letting the
equipment operate for the present day. A preventive repair takes only one day. A
repaired system has the working condition i = 1. The cost of an enforced repair
upon failure is C f and the cost of a pre-emptive repair in working condition i
is C pi . We wish to determine a maintenance rule which minimizes the long-run
average repair cost per day.
This problem can be put in the framework of a discrete-time Markov decision
model. Also, since an enforced repair takes two days and the state of the system
has to be defined at the beginning of each day, we need an auxiliary state for the
situation in which an enforced repair is in progress already for one day. Thus the
set of possible states of the system is chosen as
I = {1, 2, . . . , N, N + 1}.
State i with 1 ≤ i ≤ N corresponds to the situation in which an inspection reveals
working condition i, while state N + 1 corresponds to the situation in which an
enforced repair is in progress already for one day. Define the actions
0 if no repair is done,
a = 1 if a preventive repair is done,
2 if an enforced repair is done.
The set of possible actions in state i is chosen as
A(1) = {0}, A(i) = {0, 1} for 1 < i < N, A(N) = A(N + 1) = {2}.
The one-step transition probabilities p ij (a) are given by
p ij (0) = q ij for 1 ≤ i < N,
p i1 (1) = 1 for 1 < i < N,
p N,N+1 (2) = p N+1,1 (2) = 1,
and the other p ij (a) = 0. The one-step costs c i (a) are given by
c i (0) = 0, c i (1) = C pi , c N (2) = C f and c N+1 (2) = 0.
Stationary policies
We now introduce some concepts that will be needed in the algorithms to be
described in the next sections. A rule or policy for controlling the system is a
prescription for taking actions at each decision epoch. In principle a control rule