Page 85 - Applied Probability

P. 85

4. Hypothesis Testing and Categorical Data
68
instance, if we distinguish duplicate alleles by a superscript ∗, then the 3×4
matrix

a 1
b
b 2
b 1
b 3

1

c
c 2 a 2 a ∗ ∗ 1 a ∗ ∗ 2  (4.5)
c 3
c 1
2
3
for m = 3 loci and n = 4 haplotypes represents one out of (4!) equally
likely matrices and yields the nonzero haplotype counts
=1
n a 1 b 3 c 2
=1
n a 2 b 1 c 1
=1
n a 1 b 1 c 3
=1.
n a 2 b 2 c 2
To count the number of matrices consistent with a haplotype count vec-
tor {n i }, note that the haplotypes can be assigned to the columns of a
n
typical matrix from the uniform space in ways. Within each such as-
{n i }
m
signment, there are n jk ! permutations of the genes of the various
j=1 k
allele types among the available positions for each allele type. It follows
that the haplotype count vector {n i } has probability
n n jk !
m
Pr({n i })= {n i } j=1 k
(n!) m
n

{n i }
= m n .
j=1 {n jk }
In other words, we recover the Fisher-Yates distribution.
This alternative representation yields a device for random sampling from
the Fisher-Yates distribution [24]. If we arrange our observed haplotypes
in an m × n matrix as described above and randomly permute the entries
within each row, then we get a new matrix whose haplotype counts are
drawn from the Fisher-Yates distribution. For example, appropriate per-
mutations within each row of the matrix (4.5) produce the matrix
 ∗ ∗ 
a 1 a 1 a 2 a 2
 b 1 b ∗ 1 b 2 b 3 
c 2 c ∗ 2 c 1 c 3
with nonzero haplotype counts
=2
n a 1 b 1 c 2
=1
n a 2 b 2 c 1
=1.
n a 2 b 3 c 3

80 81 82 83 84 85 86 87 88 89 90