Page 243 - A First Course In Stochastic Models
P. 243

236             DISCRETE-TIME MARKOV DECISION PROCESSES

                i + 1. The equipment deteriorates in time. If the present working condition is i
                and no repair is done, then at the beginning of the next day the equipment has
                working condition j with probability q ij . It is assumed that q ij = 0 for j < i and
                     q ij = 1. The working condition i = N represents a malfunction that requires

                  j≥i
                an enforced repair taking two days. For the intermediate states i with 1 < i < N
                there is a choice between preventively repairing the equipment and letting the
                equipment operate for the present day. A preventive repair takes only one day. A
                repaired system has the working condition i = 1. The cost of an enforced repair
                upon failure is C f and the cost of a pre-emptive repair in working condition i
                is C pi . We wish to determine a maintenance rule which minimizes the long-run
                average repair cost per day.
                  This problem can be put in the framework of a discrete-time Markov decision
                model. Also, since an enforced repair takes two days and the state of the system
                has to be defined at the beginning of each day, we need an auxiliary state for the
                situation in which an enforced repair is in progress already for one day. Thus the
                set of possible states of the system is chosen as

                                       I = {1, 2, . . . , N, N + 1}.
                State i with 1 ≤ i ≤ N corresponds to the situation in which an inspection reveals
                working condition i, while state N + 1 corresponds to the situation in which an
                enforced repair is in progress already for one day. Define the actions

                                    
                                    0   if no repair is done,
                                 a =  1  if a preventive repair is done,
                                      2  if an enforced repair is done.
                                    
                The set of possible actions in state i is chosen as

                 A(1) = {0},  A(i) = {0, 1} for 1 < i < N,  A(N) = A(N + 1) = {2}.
                The one-step transition probabilities p ij (a) are given by

                                       p ij (0) = q ij  for 1 ≤ i < N,
                                      p i1 (1) = 1 for 1 < i < N,

                                   p N,N+1 (2) = p N+1,1 (2) = 1,
                and the other p ij (a) = 0. The one-step costs c i (a) are given by

                        c i (0) = 0,  c i (1) = C pi ,  c N (2) = C f  and c N+1 (2) = 0.


                Stationary policies
                We now introduce some concepts that will be needed in the algorithms to be
                described in the next sections. A rule or policy for controlling the system is a
                prescription for taking actions at each decision epoch. In principle a control rule
   238   239   240   241   242   243   244   245   246   247   248