Page 314 - Applied Probability
P. 314
303
14. Poisson Approximation
redundancy of a panel is desirable. In practice, the chromosome constitu-
tion of a clone cannot be predicted in advance, and the level of redundancy
is random. Minimum Hamming distance is a natural measure of the redun-
dancy of a panel. The Hamming distance ρ(c s ,c t ) of two columns c s and
c t counts the number of entries in which they differ. The minimum Ham-
ming distance of a panel is obviously defined as min {s,t} ρ(c s ,c t ), where
{s, t} ranges over all pairs of columns from the panel.
When somatic cell hybrid panels are randomly created, it is reasonable to
make three assumptions. First, each human chromosome is lost or retained
independently during the formation of a stable clone. Second, there is a
common retention probability p applying to all chromosome pairs. This
means that at least one member of each pair of homologous chromosomes
is retained with probability p. Rushton [17] estimates a range of p from .07
to .75. The value p = 1 simplifies our theory considerably. Third, different
2
clones behave independently in their retention patterns.
n
Now denote column s of a random panel of n clones by C . For any two
s
n
n
distinct columns C and C , define X n to be the indicator of the event
s
t
{s,t}
n
n
ρ(C ,C ) <d, where d is some fixed Hamming distance. The random
s t
variable Y n = X n is 0 precisely when the minimum Hamming
d {s,t} {s,t}
23
distance equals or exceeds d. There are 2 pairs α = {s, t} in the index
n
set I, and each of the associated X has the same mean
α
d−1
n i n−i
p α = q (1 − q) ,
i
i=0
n
n
where q =2p(1 − p) is the probability that C and C differ in any entry.
s
t
23
This gives the mean of Y d n as λ = p α .
2
The Chen-Stein heuristic suggests estimating Pr(Y d n > 0) by the Poisson
tail probability 1 − e −λ . The error bound (14.3) on this approximation can
be computed by defining the neighborhoods B α = {β : |β| =2,β ∩α = ∅},
where vertical bars enclosing a set indicate the number of elements in the
set. It is clear that X n is independent of those X n with β outside B α .
α β
23
2
The Chen-Stein constant b 1 reduces to |B α |p . An elementary counting
2 α
argument shows that
23 21
|B α | = − =43.
2 2
Since the joint probability p αβ does not depend on the particular pair
23
β ∈ B α \{α} chosen, the constant b 2 is 2 (|B α |− 1)p αβ . Fortunately,
p αβ = p 2 α when p =1/2. Indeed, by conditioning on the value of the
common column shared by α and β, it is obvious in this special case that
n
n
the events X = 1 and X = 1 are independent and occur with constant
β
α
probability p α . The case p =1/2 is more subtle, and we defer the details
of computing p αβ to Problem 8. Table 14.1 provides some representative