Page 85 - Becoming Metric Wise
P. 85

75
                                                                   Statistics

                 Normalizing is often done by dividing by the arithmetic mean. In
              some cases, however, scientists preferred normalizing by a geometric
              mean such as D’Souza and Smalheiser (2014). We draw the attention of
              the reader to the requirement that in (4.3) all numbers must be strictly
              positive. Indeed, once one number is zero the geometric mean is zero
              too and becomes virtually meaningless. We further note that the geomet-
              ric mean can be rewritten as the antilog (an exponential function) of the
              arithmetic mean of the logarithms of the data. Concretely:
                            !                 !    lnðy 1 Þ lnðy 2 Þ  lnðy n Þ
                      n                n
                   1  X              X   lnðy j Þ   n    n        n
              exp       lnðy j Þ 5 exp          5 e   :e   : ? :e
                   n                       n
                     j51              j51
                                    1=n       1=n        1=n
                             5 e ln y 1  :e ln y 2  : ? :e ln y n
                                 1=n  1=n   1=n              1=n  p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
                             5 y   :y  : ? :y  5 y 1 :y 2 : ? :y n Þ  5  n
                                                 ð
                                 1  2       n                       y 1 :y 2 : ? :y n
                                                                          (4.5)
                 This expression can be approximated by:

                                                    !
                                          n
                                       1  X
                                  exp       lnðy j 1 1Þ 2 1               (4.6)
                                       n
                                         j51
              which has the advantage of being applicable when zeros occur, as is the
              case for citation data. Formula (4.6), applied to journal citation data is
              proposed in Thelwall and Fairclough (2015) as an alternative for the stan-
              dard journal impact factor, which is an arithmetic mean. In Subsection
              6.7.3 we will show that when the geometric mean is used, the average
              and the global impact factors of a set of journals coincide.
                 The revised source-normalized impact per paper indicator (Waltman
              et al., 2013), see Chapter 6: Journal Citation Analysis, formula (6.17),
              makes use of harmonic means. An important harmonic mean is the
              F-score used in information retrieval. It is defined as the harmonic mean
              of recall (REC) and precision (PREC), where recall is the number of
              retrieved and relevant items divided by the number of relevant items in
              the database, and precision is the number of retrieved and relevant items
              divided by the number of retrieved items. Hence we have:
                                               2
                                      F 5                                 (4.7)
                                            1  1   1
                                           REC   PREC
                 If O 1 and O 2 are the two overlap measures introduced in Egghe and
              Michel (2002) then their geometric mean is the Salton measure, while
   80   81   82   83   84   85   86   87   88   89   90