Page 105 - Becoming Metric Wise
P. 105

95
                                                                   Statistics

              Table 4.5 Number of publications of information scientists at university A and
              university B

              A  B  B  A   B   A   B  A   B   B  B   A   B  B   A   A  B   A
              5  8  11  12  14  16  17  19  22  26  38  40  51  57  61  76  90  105
              1  2  3  4   5   6   7  8   9   10  11  12  13  14  15  16  17  18
              First row: affiliation.
              Second row: number of publications.
              Third row: rank (from lowest producer to highest).

              4.13.2 Mann-Whitney U-test for Equality of Distributions
              (Mann & Whitney, 1947)
              This test is also known as the Mann-Whitney-Wilcoxon test. It is a
              two-sample rank test. This nonparametric test has as null hypothesis
              that two samples come from the same population (or they come from
              two populations with the same statistical properties). The alternative is
              often formulated as a one-sided test: one population has larger values
              than the other one. The following explanation is largely taken from
              Egghe and Rousseau (1990). Suppose we are interested in the question
              whether information scientists at university A are more productive than
              information scientists at university B. One may assume that outputs dif-
              fer, but the question is whether these differences can be attributed to
              chance fluctuations. Therefore we consider their publication lists over
              the past 8 years. Results are shown in Table 4.5. Note that eight infor-
              mation scientists work at university A and ten at university B. For the
              moment we assume that all scientists have a different output, hence
              there are no ties.
                 The test is derived from the following line of reasoning. If the publi-
              cation outputs of the information scientists at these two universities differ
              strongly, the lower numbers of publications will mainly be found for
              scientists at one university and the higher numbers for the others. In the
              most extreme case. the lowest ranks will all be assigned to one group and
              the highest to the other. If the first group has m members and the second
              one n members and if the members of the second group all publish more
              than those of the first, then the sum of the ranks of the second group,
              denoted by T2 will be at its maximum. This maximum sum is equal to
              nm 1 n(n 1 1)/2. Indeed, in this extreme case the members of the second
              group occupy ranks m 1 1upto m 1 n. The sum of these ranks is equal
              to the sum of the first m 1 n 1 1 natural numbers minus the sum of the
              first m natural numbers. This is:
   100   101   102   103   104   105   106   107   108   109   110