Page 103 - Becoming Metric Wise
P. 103

93
                                                                   Statistics

              Table 4.4 z-Values and corresponding N(z)-values
              z        0        1.282     1.645      1.960     2.326      2.576
              N(z)     0.5      0.9       0.95       0.975     0.99       0.995


              z-score is obtained by subtracting the average and dividing the result by the
              standard deviation. This leads to a z-score of (39 25)/75 2.
                 Using z-scores is a way of comparing data from different normal
              distributions.

              4.13 HYPOTHESIS TESTING

              We only discuss two nonparametric tests: the chi-square test for indepen-
              dence and homogeneity in tables and the Mann-Whitney U-test for
              equality of distributions. We assume that the reader is already familiar
              with basic hypothesis testing such as the z- or the t-test.
                 Nonparametric tests are designed to avoid assumptions inherent in
              more common tests. Usually such assumptions assume that variables are
              normally distributed or are distributed according to a distribution that is
              related to the normal distribution. For instance, the standard test on the
              difference between two means assumes that the distribution of the means
              follows a t-distribution (a distribution which very much resembles the
              normal distribution) and that data are sampled from continuous distribu-
              tions with the same variance. These assumptions are rarely met for infor-
              metric data.
              4.13.1 Test of Independence in Contingency Tables

              This subsection is largely taken from Egghe and Rousseau (1990,
              [I.3.5.3]). A contingency table, as discussed in Section 4.9, is a multiple
              classification. Items under study are classified according to two criteria,
              one having m categories and the other having n. Hence the contingency
              table is an m 3 n matrix. Cell frequencies are denoted as O ij and
              P
                 O ij 5 N. Cell frequencies obtained under the assumption of inde-
                i;j
              pendence, see Table 4.3, are the expected values, denoted as E ij . These
              are compared with the observed frequencies O ij . Then the quantity
                                     m   n            2 !
                                              O ij 2E ij
                                    X X
                                                                         (4.23)
                                                E ij
                                    i51 j51
   98   99   100   101   102   103   104   105   106   107   108