Page 93 - Applied Probability
P. 93

4. Hypothesis Testing and Categorical Data
                              76
                                10. A geneticist phenotypes n unrelated people at each of m loci with
                                   codominant alleles and records a vector i =(i 1 /i ,...,i m/i )of
                                                                                 ∗
                                                                                          ∗
                                                                                 1
                                                                                          m
                                   genotypes for each person. Because phase is unknown, i cannot be
                                   resolved into two haplotypes. The data gathered can be summarized
                                   by the number of people n i counted for each genotype vector i. Let n jk
                                   be the number of alleles of type k at locus j observed in the sample,
                                   and let n h be the total number of heterozygotes observed over all
                                   loci. Assuming genetic equilibrium, prove that the distribution of the
                                   counts {n i } conditional on the allele totals {n jk } is
                                                                         n

                                                                            2 n h
                                                                        {n i }
                                                 Pr({n i }|{n jk })=            .         (4.10)
                                                                       m     2n

                                                                       j=1 {n jk }
                                   The moments of the distribution (4.10) are computed in [24]; just as
                                   with haplotype count data, all allele frequencies cancel.
                                11. Describe and program an efficient algorithm for generating random
                                   permutations of the set {1,...,n}. How many calls of a random num-
                                   ber generator are involved? How many interchanges of two numbers?
                                   You might wish to compare your results to the algorithm in [29].
                                12. Describe and program a permutation version of the two-sample t-test.
                                   Compare it on actual data to the standard two-sample t-test.
                              4.10    References
                               [1] Agresti A (1992) A survey of exact inference for contingency tables.
                                   Stat Sci 7:131–177

                               [2] Allison DB, Heo M, Kaplan N, Martin ER (1999) Sibling-based tests
                                   of linkage and association for quantitative traits. Amer J Hum Genet
                                   64:1754–1763

                               [3] Badner JA, Chakravarti A, Wagener DK (1984) A test of nonrandom
                                   segregation. Genetic Epidemiology 1:329–340
                               [4] Barbour AD, Holst L, Janson S (1992) Poisson Approximation. Oxford
                                   University Press, Oxford

                               [5] Boehnke M, Langefeld CD (1998) Genetic association mapping based
                                   on discordant sib pairs: the discordant-alleles test. Amer J Hum Genet
                                   62:950–961

                               [6] Cavalli-Sforza LL, Bodmer WF (1971) The Genetics of Human Pop-
                                   ulations. Freeman, San Francisco
   88   89   90   91   92   93   94   95   96   97   98