Page 87 - Applied Probability
P. 87

4. Hypothesis Testing and Categorical Data
                              70
                              The levels for disease status are case and control. The levels for genotype
                              are the various observed genotypes among either cases or controls. One
                              can use multilocus genotypes rather than single-locus genotypes and, when
                              available, haplotypes rather than genotypes. The use of haplotypes doubles
                              sample size and leads to smaller tables with less sparsity. Because cases
                              often differ from controls in the overabundance of one or two genotypes
                              (or haplotypes), it is desirable to implement a test that is sensitive to such
                              departures. A variation on the Z max test is nearly ideal in this regard.
                              Consider the standardized residuals
                                                              c ij − E(c ij )
                                                     Z ij  =            ,
                                                                Var(c ij )
                              where c 1j is the number of times genotype j appears among cases and c 2j
                              is the number of times genotype j appears among controls. The statistic
                              Z max = max i,j Z ij simplifies to Z max = max 1,j |Z ij | because Z 2j = −Z 1j .
                              Permutation of case-control labels offer the opportunity of approximating
                              the distribution of this statistic. Problems 8 and 9 give the mean and
                              variance of c 1j as
                                                   c 1. c .j
                                        E(c 1j )  =                                        (4.6)
                                                     n
                                                   c 1. (c 1. − 1)c .j (c .j − 1)     2
                                      Var(c 1j )  =                    +E(c 1j ) − E(c 1j ) ,
                                                         n(n − 1)
                              where c 1. is the number of cases, c .j is the number of times genotype j
                              appears among both cases and controls, and n is the number of cases plus
                              the number of controls. The marginal sums c 1. and c .j are the analogs of
                              the marginal allele counts n jk in the linkage equilibrium problem.
                              Example 4.7.1 Exact Treatment of the ABO Ulcer Data

                              The ABO ulcer data of Table 4.1 provide a chance to compare the various
                              test statistics. The permutation version of Fisher’s exact test and the Z max
                              test give p-values of 0.0335 ± 0.0036 and 0.0169 ± 0.0026, respectively, for
                              10,000 permutations. As anticipated, the Z max statistic attains its maxi-
                              mum for genotype O. These results compare well with the p-value of 0.0295
                              for the likelihood ratio test and suggest that the Z max statistic possesses
                              somewhat greater power than the other two statistics for detecting depar-
                              tures in a single genotype.



                              4.8 The Transmission/Disequilibrium Test

                              Example 4.2.1 on the association between the ABO system and duodenal
                              ulcer depended on detecting a difference in allele frequencies between pa-
                              tients and normal controls. In a racially homogeneous population like that
   82   83   84   85   86   87   88   89   90   91   92