Page 242 - A First Course In Stochastic Models
P. 242

THE MODEL                            235

                that are sequentially made when the system evolves over time. An infinite plan-
                ning horizon is assumed and the goal is to find a control rule which minimizes the
                long-run average cost per time unit.
                  A typical example of a controlled dynamic system is an inventory system with
                stochastic demands where the inventory position is periodically reviewed. The deci-
                sions taken at the review times consist of ordering a certain amount of the product
                depending on the inventory position. The economic consequences of the decisions
                are reflected in ordering, inventory and shortage costs.
                  We now introduce the Markov decision model. Consider a dynamic system which
                is reviewed at equidistant points of time t = 0, 1, . . . . At each review the system
                is classified into one of a possible number of states and subsequently a decision
                has to be made. The set of possible states is denoted by I. For each state i ∈ I,
                a set A(i) of decisions or actions is given. The state space I and the action sets
                A(i) are assumed to be finite. The economic consequences of the decisions taken at
                the review times (decision epochs) are reflected in costs. This controlled dynamic
                system is called a discrete-time Markov model when the following Markovian
                property is satisfied. If at a decision epoch the action a is chosen in state i, then
                regardless of the past history of the system, the following happens:

                (a) an immediate cost c i (a) is incurred,

                (b) at the next decision epoch the system will be in state j with probability p ij (a),
                   where

                                            p ij (a) = 1,  i ∈ I.
                                         j∈I

                Note that the one-step costs c i (a) and the one-step transition probabilities p ij (a)
                are assumed to be time homogeneous. In specific problems the ‘immediate’ costs
                c i (a) will often represent the expected cost incurred until the next decision epoch
                when action a is chosen in state i. Also, it should be emphasized that the choice
                of the state space and of the action sets often depends on the cost structure of
                the specific problem considered. For example, in a production/inventory problem
                involving a fixed set-up cost for restarting production after an idle period, the
                state description should include a state variable indicating whether the production
                facility is on or off. Many practical control problems can be modelled as a Markov
                decision process by an appropriate choice of the state space and action sets. Before
                we develop the required theory for the average cost criterion, we give a typical
                example of a Markov decision problem.

                Example 6.1.1 A maintenance problem

                At the beginning of each day a piece of equipment is inspected to reveal its actual
                working condition. The equipment will be found in one of the working conditions
                i = 1, . . . , N, where the working condition i is better than the working condition
   237   238   239   240   241   242   243   244   245   246   247