Page 242 - A First Course In Stochastic Models
P. 242
THE MODEL 235
that are sequentially made when the system evolves over time. An infinite plan-
ning horizon is assumed and the goal is to find a control rule which minimizes the
long-run average cost per time unit.
A typical example of a controlled dynamic system is an inventory system with
stochastic demands where the inventory position is periodically reviewed. The deci-
sions taken at the review times consist of ordering a certain amount of the product
depending on the inventory position. The economic consequences of the decisions
are reflected in ordering, inventory and shortage costs.
We now introduce the Markov decision model. Consider a dynamic system which
is reviewed at equidistant points of time t = 0, 1, . . . . At each review the system
is classified into one of a possible number of states and subsequently a decision
has to be made. The set of possible states is denoted by I. For each state i ∈ I,
a set A(i) of decisions or actions is given. The state space I and the action sets
A(i) are assumed to be finite. The economic consequences of the decisions taken at
the review times (decision epochs) are reflected in costs. This controlled dynamic
system is called a discrete-time Markov model when the following Markovian
property is satisfied. If at a decision epoch the action a is chosen in state i, then
regardless of the past history of the system, the following happens:
(a) an immediate cost c i (a) is incurred,
(b) at the next decision epoch the system will be in state j with probability p ij (a),
where
p ij (a) = 1, i ∈ I.
j∈I
Note that the one-step costs c i (a) and the one-step transition probabilities p ij (a)
are assumed to be time homogeneous. In specific problems the ‘immediate’ costs
c i (a) will often represent the expected cost incurred until the next decision epoch
when action a is chosen in state i. Also, it should be emphasized that the choice
of the state space and of the action sets often depends on the cost structure of
the specific problem considered. For example, in a production/inventory problem
involving a fixed set-up cost for restarting production after an idle period, the
state description should include a state variable indicating whether the production
facility is on or off. Many practical control problems can be modelled as a Markov
decision process by an appropriate choice of the state space and action sets. Before
we develop the required theory for the average cost criterion, we give a typical
example of a Markov decision problem.
Example 6.1.1 A maintenance problem
At the beginning of each day a piece of equipment is inspected to reveal its actual
working condition. The equipment will be found in one of the working conditions
i = 1, . . . , N, where the working condition i is better than the working condition