Page 295 - A First Course In Stochastic Models
P. 295

VALUE ITERATION AND FICTITIOUS DECISIONS          289

                and τ s (0) = 1/ν(i 1 , i 2 ). For action a = 1 in state s = (i 1 , i 2 , 1),
                               
                               λ 1 /ν(i 1 + 1, i 2 ),  v = (i 1 + 1, i 2 , 1),
                               
                                 λ 2 /ν(i 1 + 1, i 2 ),  v = (i 1 + 1, i 2 , 2),
                               
                       p sv (1) =
                               (i 1 + 1)µ 1 /ν(i 1 + 1, i 2 ),  v = (i 1 , i 2 , 0),
                               
                                 i 2 µ 2 /ν(i 1 + 1, i 2 ),  v = (i 1 + 1, i 2 − 1, 0).
                               
                and τ s (1) = 1/ν(i 1 + 1, i 2 ). Similarly, for action a = 1 in state (i 1 , i 2 , 2). Finally,
                the one-step expected costs c s (a) are simply given by
                                       
                                       1,    s = (i 1 , i 2 , 1) and a = 0,
                                c s (a) =  1,  s = (i 1 , i 2 , 2) and a = 0,
                                         0,   otherwise.
                                       

                Value-iteration algorithm
                Now, having specified the basic elements of the semi-Markov decision model, we
                are in a position to formulate the value-iteration algorithm for the computation of
                a (nearly) optimal acceptance rule. In the data transformation, we take
                                                  1
                                     τ =                     .
                                         λ 1 + λ 2 + c 1 µ 1 + c 2 µ 2
                Using the above specifications, the value-iteration scheme becomes quite simple for
                the allocation problem. Note that the expressions for the one-step transition times
                τ s (a) and the one-step transition probabilities p st (a) have a common denominator
                and so the ratio p st (a)/τ s (a) has a very simple form. In specifying the value-
                iteration scheme (7.2.3), we distinguish between the auxiliary states (i 1 , i 2 , 0) and
                the other states. In the states (i 1 , i 2 , 0) the only possible decision is to leave the
                system alone. Thus

                V n (i 1 , i 2 , 0) = τλ 1 V n−1 (i 1 , i 2 , 1) + τλ 2 V n−1 (i 1 , i 2 , 2) + τi 1 µ 1 V n−1 (i 1 − 1, i 2 , 0)
                             + τi 2 µ 2 V n−1 (i 1 , i 2 − 1, 0) + {1 − τν(i 1 , i 2 )}V n−1 (i 1 , i 2 , 0),

                where V n−1 (i 1 , i 2 , 1) = 0 when i 1 < 0 or i 2 < 0. For the states (i 1 , i 2 , 1),

                 V n (i 1 , i 2 , 1) = min ν(i 1 , i 2 ) + τλ 1 V n−1 (i 1 , i 2 , 1) + τλ 2 V n−1 (i 1 , i 2 , 2)
                                 + τi 1 µ 1 V n−1 (i 1 − 1, i 2 , 0) + τi 2 µ 2 V n−1 (i 1 , i 2 − 1, 0)

                                 + {1 − τν(i 1 , i 2 )}V n−1 (i 1 , i 2 , 1),
                                   τλ 1 V n−1 (i 1 + 1, i 2 , 1) + τλ 2 V n−1 (i 1 + 1, i 2 , 2)
                                 + τ(i 1 + 1)µ 1 V n−1 (i 1 , i 2 , 0) + τi 2 µ 2 V n−1 (i 1 + 1, i 2 − 1, 0)

                                 + {1 − τν(i 1 + 1, i 2 )}V n−1 (i 1 , i 2 , 1) ,
   290   291   292   293   294   295   296   297   298   299   300