Page 274 - A First Course In Stochastic Models
P. 274

CONVERGENCE PROOFS                       267

                                        ℓ
                Denote by a ℓ = e −µD (µD) /ℓ! the probability of the completion of ℓ service
                phases during an interarrival time D when the server is continuously busy. Then
                the recursive value-iteration equation (6.6.1) becomes

                        i+r−1                      i+r−1

                 V n (i) =   a ℓ V n−1 (i + r − ℓ) + 1 −  a ℓ V n−1 (0),  0 ≤ i ≤ Kr − r
                         ℓ=0                        ℓ=0
                            i−1                  i−1

                 V n (i) = 1 +  a ℓ V n−1 (i − ℓ) + 1 −  a ℓ V n−1 (0),  Kr − r < i ≤ Kr.
                           ℓ=0                   ℓ=0
                The discrete-time Markov chain describing the number of service phases present at
                the arrival epochs is aperiodic. Hence the lower and upper bounds m n and M n from
                the value-iteration algorithm both converge to the long-run fraction of customers
                who are lost.



                                 6.7  CONVERGENCE PROOFS

                In this section we give convergence proofs for the policy-iteration algorithm and
                the value-iteration algorithm. The finite convergence of the policy-iteration algo-
                rithm is proved for the unichain case. For the standard value-iteration algorithm
                the convergence of the bounds m n and M n to the same limit is proved under the
                unichain assumption together with the assumption that the one-step transition prob-
                ability p ii (a) > 0 for all i ∈ I and a ∈ A(i). The latter aperiodicity assumption
                is automatically satisfied when the data transformation discussed in Section 6.6 is
                applied.


                Convergence proof for policy iteration
                We first establish a lexicographical ordering for the average cost and the relative
                values associated with the policies that are generated by the algorithm. For that
                purpose we need to standardize the relative value functions since a relative value
                function is not uniquely determined. Let us number or renumber the possible states
                as i = 1, . . . , N. In view of the fact that the relative values of a given policy
                are unique up to an additive constant, the sequence of policies generated by the
                algorithm does not depend on the particular choice of the relative value function
                for a given policy. For each stationary policy Q, we now consider the particular
                relative value function w i (Q) defined by (6.3.1), where the regeneration state r is
                chosen as the largest state in I (Q). The set I (Q) is defined by

                         I (Q) = the set of states that are recurrent under policy Q.

                Let R and R be immediate successors in the sequence of policies generated by the
                algorithm. Suppose that R  = R. We assert that either
   269   270   271   272   273   274   275   276   277   278   279