Page 31 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 31
10 1 Introduction
Several statistics, whose only assumption is the existence of a total order
relation, can be applied to ordinal data. One such statistic is the median, as shown
in Example 1.2.
Continuous variables have a real number interval (or a reunion of intervals) as
domain, which is unique up to a linear transformation. One can further distinguish
between ratio type variables, supporting linear transformations of the y = ax type,
and interval type variables supporting linear transformations of the y = ax + b type.
The domain of ratio type variables has a fixed zero. This is the most frequent type
of continuous variables encountered, as in Example 1.3 (a zero ohm resistance is a
zero resistance in whatever measurement scale we choose to elect). The whole
panoply of statistics is supported by continuous ratio type variables. The less
common interval type variables do not have a fixed zero. An example of interval
type data is temperature data, which can either be measured in degrees Celsius (X C)
or in degrees Fahrenheit (X F), satisfying the relation X F = 1.8X C + 32. There are
only a few, less frequent statistics, requiring a fixed zero, not supported by this
type of variables.
Notice that, strictly speaking, there is no such thing as continuous data, since all
data can only be measured with finite precision. If, for example, one is dealing
with data representing people’s height in meters, “real-flavour” numbers such as
1.82 m may be used. Of course, if the highest measurement precision is the
millimetre, one is in fact dealing with integer numbers such as 182 mm, i.e., the
height data is, in fact, ordinal data. In practice, however, one often assumes that
there is a continuous domain underlying the ordinal data. For instance, one often
assumes that the height data can be measured with arbitrarily high precision. Even
for rank data such as the examination scores of Example 1.2, one often computes
an average score, obtaining a value in the continuous interval [0, 5], i.e., one is
implicitly assuming that the examination scores can be measured with a higher
precision.
1.4 Probabilities and Distributions
The process of statistically analysing a dataset involves operating with an
appropriate measure expressing the randomness exhibited by the dataset. This
measure is the probability measure. In this section, we will introduce a few topics
of Probability Theory that are needed for the understanding of the following
material. The reader familiar with Probability Theory can skip this section. A more
detailed survey (but still a brief one) on Probability Theory can be found in
Appendix A.
1.4.1 Discrete Variables
The beginnings of Probability Theory can be traced far back in time to studies on
chance games. The work of the Swiss mathematician Jacob Bernoulli (1654-1705),
Ars Conjectandi, represented a keystone in the development of a Theory of