Page 107 - Becoming Metric Wise
P. 107
97
Statistics
This expression means that the expression on the left-hand side is dis-
tributed as the standard normal distribution (the normal distribution with
mean 0 and variance 1). It is now possible to apply a standard normal
(parametric) test (a so-called z-test), see e.g., Egghe and Rousseau (2001).
For smaller values of m and n, one has to use dedicated software (or
printed tables).
In case there are ties, one uses as rank the median rank of all outputs
with the same rank. In such cases statistical software automatically applies
a correction for ties. Huber and Wagner-Do ¨bler (2003) provide a handy
spreadsheet when using the Mann-Whitney test in case there are ties and
no statistical software package is available. Note that their formulae are
slightly different from ours as their U is our T. We followed the original
notation from Mann and Whitney.
Finally we mention that if the two populations under study with dis-
tribution functions F 1 and F 2 , can be assumed to be continuous and the
alternative is stated as a shift in location (i.e., F 1 (x) 5 F 2 (x 1 δ)), this
means that the two distributions have the same shape. Under these cir-
cumstances, a rejection of the null hypothesis can be interpreted as show-
ing a difference in medians.
4.14 CONCLUDING REMARKS ON STATISTICS
For an elementary introduction to issues on parametric and nonparamet-
ric statistical inference we refer the reader to Egghe and Rousseau (2001)
and Vaughan (2001).
It has recently come to the attention of scientists that statistics are
sometimes misused and that the traditional method of hypothesis testing
may lead to invalid or useless conclusions (Ioannidis, 2005; Schneider,
2012, 2013, 2015; Vidgen & Yasseri, 2016). Three points are important
here: Not reporting effect sizes, not reporting statistical power and the
misuse of P-values. A Bayesian approach (the “traditional” method is
then referred to as the frequentist approach) may be the solution to the
last problem. Yet, we consider these topics to be outside the scope of this
book. Moreover, when working with a “complete population” statistical
significance tests are irrelevant as there is no sampling error. In these cir-
cumstances, one must use the techniques provided by descriptive statistics.
Yet, such populations often change over time, and as such they too can
be considered to be samples. Discussions of this point can be found in
issue 10(4) of the Journal of Informetrics.