Page 85 - Applied Probability
P. 85

4. Hypothesis Testing and Categorical Data
                              68
                              instance, if we distinguish duplicate alleles by a superscript ∗, then the 3×4
                              matrix
                                                    
                                                      a 1
                                                              b
                                                                  b 2
                                                          b 1
                                                      b 3
                                                    
                                                               1
                                                                    
                                                                  c
                                                      c 2  a 2  a ∗ ∗ 1  a ∗ ∗ 2          (4.5)
                                                              c 3
                                                          c 1
                                                                   2
                                                                                       3
                              for m = 3 loci and n = 4 haplotypes represents one out of (4!) equally
                              likely matrices and yields the nonzero haplotype counts
                                                                =1
                                                        n a 1 b 3 c 2
                                                                =1
                                                        n a 2 b 1 c 1
                                                                =1
                                                        n a 1 b 1 c 3
                                                                =1.
                                                        n a 2 b 2 c 2
                                To count the number of matrices consistent with a haplotype count vec-
                              tor {n i }, note that the haplotypes can be assigned to the columns of a
                                                                     n
                              typical matrix from the uniform space in  ways. Within each such as-
                                                                   {n i }
                                                  m
                              signment, there are       n jk ! permutations of the genes of the various
                                                  j=1  k
                              allele types among the available positions for each allele type. It follows
                              that the haplotype count vector {n i } has probability
                                                               n          n jk !
                                                                     m
                                               Pr({n i })=    {n i }  j=1  k
                                                                   (n!) m
                                                                  n

                                                                 {n i }
                                                         =     m     n   .
                                                               j=1 {n jk }
                              In other words, we recover the Fisher-Yates distribution.
                                This alternative representation yields a device for random sampling from
                              the Fisher-Yates distribution [24]. If we arrange our observed haplotypes
                              in an m × n matrix as described above and randomly permute the entries
                              within each row, then we get a new matrix whose haplotype counts are
                              drawn from the Fisher-Yates distribution. For example, appropriate per-
                              mutations within each row of the matrix (4.5) produce the matrix
                                                          ∗       ∗  
                                                      a 1  a 1  a 2  a 2
                                                      b 1  b ∗ 1  b 2  b 3  
                                                      c 2  c ∗ 2  c 1  c 3
                              with nonzero haplotype counts
                                                                =2
                                                        n a 1 b 1 c 2
                                                                =1
                                                        n a 2 b 2 c 1
                                                                =1.
                                                        n a 2 b 3 c 3
   80   81   82   83   84   85   86   87   88   89   90