Page 212 - Applied Probability
P. 212
then we can recover approximate expectations with respect to π by
taking weighted averages. In this scheme, the state z is given weight
w z = π z /ν z .If Z 0 ,Z 1,Z 2 ... is a run from the chain with equilibrium
distribution ν, then under the appropriate ergodic assumptions, prove
that
n−1 9. Descent Graph Methods 197
f(Z i )
lim i=0 w Z i =E π [f(X)].
n−1
n→∞
i=0 w Z i
1/τ
The choice ν z ∝ π z for τ> 1 lowers the peaks and raises the valleys
of π [14]. Unfortunately in practice, if ν differs too much from π, then
the estimator
n−1
i=0 w Z i f(Z i )
n−1
i=0 w Z i
of the expectation E π [f(X)] will have a large variance for n of mod-
erate size.
10. Another device to improve mixing of a Markov chain is to run several
parallel chains on the same state space and occasionally swap their
states [9]. If π is the distribution of the chain we wish to sample
from, then let π (1) = π, and define m − 1 additional distributions
π (2) ,... ,π (m) . For instance, incremental heating can be achieved by
taking
1
π (k) ∝ π z 1+(k−1)τ
z
for τ> 0. At epoch n we sample for each chain k a state Z nk given
the chain’s previous state Z n−1,k . We then randomly select chain i
with probability 1 and consider swapping states between it and chain
m
j = i + 1. (When i = m no swap is performed.) Under appropriate
ergodic assumptions on the m participating chains, show that if the
acceptance probability for the proposed swap is
(i) (j)
π π
!
Z ni
min 1, Z nj (j) ,
(i)
π π
Z ni Z nj
then the product chain is ergodic with equilibrium distribution given
by the product distribution π (1) ⊗ π (2) ⊗· · · ⊗ π (m) . The marginal
distribution of this distribution for chain 1 is just π. Therefore, we
can throw away the outcomes of chains 2 through m, and estimate
expectations with respect to π by forming sample averages from the
embedded run of chain 1. (Hint: The fact that no swap is possible
at each step allows the chains to run independently for an arbitrary
number of steps.)