Page 85 - Becoming Metric Wise
P. 85
75
Statistics
Normalizing is often done by dividing by the arithmetic mean. In
some cases, however, scientists preferred normalizing by a geometric
mean such as D’Souza and Smalheiser (2014). We draw the attention of
the reader to the requirement that in (4.3) all numbers must be strictly
positive. Indeed, once one number is zero the geometric mean is zero
too and becomes virtually meaningless. We further note that the geomet-
ric mean can be rewritten as the antilog (an exponential function) of the
arithmetic mean of the logarithms of the data. Concretely:
! ! lnðy 1 Þ lnðy 2 Þ lnðy n Þ
n n
1 X X lnðy j Þ n n n
exp lnðy j Þ 5 exp 5 e :e : ? :e
n n
j51 j51
1=n 1=n 1=n
5 e ln y 1 :e ln y 2 : ? :e ln y n
1=n 1=n 1=n 1=n p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
5 y :y : ? :y 5 y 1 :y 2 : ? :y n Þ 5 n
ð
1 2 n y 1 :y 2 : ? :y n
(4.5)
This expression can be approximated by:
!
n
1 X
exp lnðy j 1 1Þ 2 1 (4.6)
n
j51
which has the advantage of being applicable when zeros occur, as is the
case for citation data. Formula (4.6), applied to journal citation data is
proposed in Thelwall and Fairclough (2015) as an alternative for the stan-
dard journal impact factor, which is an arithmetic mean. In Subsection
6.7.3 we will show that when the geometric mean is used, the average
and the global impact factors of a set of journals coincide.
The revised source-normalized impact per paper indicator (Waltman
et al., 2013), see Chapter 6: Journal Citation Analysis, formula (6.17),
makes use of harmonic means. An important harmonic mean is the
F-score used in information retrieval. It is defined as the harmonic mean
of recall (REC) and precision (PREC), where recall is the number of
retrieved and relevant items divided by the number of relevant items in
the database, and precision is the number of retrieved and relevant items
divided by the number of retrieved items. Hence we have:
2
F 5 (4.7)
1 1 1
REC PREC
If O 1 and O 2 are the two overlap measures introduced in Egghe and
Michel (2002) then their geometric mean is the Salton measure, while