Page 80 - Becoming Metric Wise
P. 80
71
Statistics
Indeed, following this construction we have that kW 5 (R 1W), or
(k 2 1)W 5 R, hence W 5 R/(k 2 1). Yet, in practice this simple con-
struction is not always possible and one needs one or two rest classes e.g.,
every observation smaller than a given number (the upper bound of the
first interval), and everything larger than the lower limit of the last inter-
val. Next, one determines how many observations belong to each of the
classes leading to an absolute frequency distribution. A graphical represen-
tation of the resulting frequency distribution is called a histogram.
The classical histogram is the counterpart of a bar diagram. Instead of
drawing a bar, one draws a rectangle, centered on the mid-point of a bin
(or class). The bins are specified as adjacent, nonoverlapping intervals of a
variable. Histograms using relative frequencies provide a rough idea of the
density of the population from which data have been collected. Note
that, for such density histograms, relative frequencies must be represented
by areas, not by heights.
A density histogram usually has one top (then it is said to be unimo-
dal) and is either approximately symmetric or skewed. If there is a long
elongated tail on the right-hand side, the histogram is said to be right-
skewed, and if the long tail is to the left, it is left-skewed. Most histo-
grams encountered in informetrics are right skewed. For instance, there
are many scientists publishing relatively few articles and few scientists
publishing a large number of articles. This leads to a right-skewed distri-
bution. If data originate from two sets with different characteristics this
leads to a bimodal histogram.
Instead of drawing rectangles, one may connect the midpoints of the
upper parts of the rectangles. These are then connected with the mid-
point of the first empty class on the left and the first empty class on the
right. The resulting curve is called a frequency polygon. When the histo-
gram represents relative frequencies, then the area under the frequency
polygon is one.
4.2.4 An Illustration
Table 4.1 shows (fictitious) publication and citation data of a scientist dur-
ing a given publication and citation window. This scientist has published
a total of 30 articles and the table shows how many articles have received
zero citations, how many one citation and so on. The following column
shows relative frequencies and the last column shows cumulative relative
frequencies (see also Subsection 4.4.1).