Page 76 - Becoming Metric Wise

P. 76

CHAPTER 4

Statistics

4.1 INTRODUCTION
Statistical analysis can be subdivided into two parts descriptive statistics
and inferential statistics. In descriptive statistics, one summarizes and
graphically represents data of a sample or a whole population. In inferen-
tial statistics, one not only collects numerical data as a sample from a pop-
ulation but also analyzes it and, based on this analysis, draws conclusions
with estimated uncertainties (i.e., by using probability theory) about the
population. It goes without saying that in order to measure aspects of sci-
entific communication and to evaluate scientific research, scientists use
statistical techniques. Although hundreds of books have been written on
statistics, few deal explicitly with statistics in the framework of informa-
tion and library science. A basic introductory text for library professionals
is Vaughan (2001), while Egghe and Rousseau (2001) is more elementary.
One quarter of Introduction to Informetrics (Egghe & Rousseau, 1990) is
devoted to statistics. Ding et al. (2014) contains a practical introduction to
recent developments in informetrics, including statistical methods.
The term population refers to the set of entities (physical or abstract
ones) about which one seeks information. The publications of scientists
forming a research group, of scientists in a country, of scientists active in a
scientific domain; of articles included in Scopus and published during the
year 2015, are all examples of populations.
In order to investigate a population, the investigator collects data. If it
is possible, the best option is to include the whole population in this
investigation. Yet, it is often impossible to collect data on the whole pop-
ulation, so the statistician collects a representative sample. This means that
a subset is collected in such a way that it provides a miniature image of
the whole population. If, moreover, the sample is large enough, then a
diligent analysis of the sample will lead to conclusions that are, to a large
extent, also valid for the whole population. Such conclusions must be
reliable, which includes that the probability to be correct must be known.
Classical inferential statistics draws samples from a population and then
tries to obtain conclusions that are valid for the whole population (with a

71 72 73 74 75 76 77 78 79 80 81