Page 80 - Becoming Metric Wise
P. 80

71
                                                                   Statistics

              Indeed, following this construction we have that kW 5 (R 1W), or
              (k 2 1)W 5 R, hence W 5 R/(k 2 1). Yet, in practice this simple con-
              struction is not always possible and one needs one or two rest classes e.g.,
              every observation smaller than a given number (the upper bound of the
              first interval), and everything larger than the lower limit of the last inter-
              val. Next, one determines how many observations belong to each of the
              classes leading to an absolute frequency distribution. A graphical represen-
              tation of the resulting frequency distribution is called a histogram.
                 The classical histogram is the counterpart of a bar diagram. Instead of
              drawing a bar, one draws a rectangle, centered on the mid-point of a bin
              (or class). The bins are specified as adjacent, nonoverlapping intervals of a
              variable. Histograms using relative frequencies provide a rough idea of the
              density of the population from which data have been collected. Note
              that, for such density histograms, relative frequencies must be represented
              by areas, not by heights.
                 A density histogram usually has one top (then it is said to be unimo-
              dal) and is either approximately symmetric or skewed. If there is a long
              elongated tail on the right-hand side, the histogram is said to be right-
              skewed, and if the long tail is to the left, it is left-skewed. Most histo-
              grams encountered in informetrics are right skewed. For instance, there
              are many scientists publishing relatively few articles and few scientists
              publishing a large number of articles. This leads to a right-skewed distri-
              bution. If data originate from two sets with different characteristics this
              leads to a bimodal histogram.
                 Instead of drawing rectangles, one may connect the midpoints of the
              upper parts of the rectangles. These are then connected with the mid-
              point of the first empty class on the left and the first empty class on the
              right. The resulting curve is called a frequency polygon. When the histo-
              gram represents relative frequencies, then the area under the frequency
              polygon is one.


              4.2.4 An Illustration

              Table 4.1 shows (fictitious) publication and citation data of a scientist dur-
              ing a given publication and citation window. This scientist has published
              a total of 30 articles and the table shows how many articles have received
              zero citations, how many one citation and so on. The following column
              shows relative frequencies and the last column shows cumulative relative
              frequencies (see also Subsection 4.4.1).
   75   76   77   78   79   80   81   82   83   84   85