Page 29 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 29

8 1 Introduction

This can be done, for instance, with the help of a random number generator. In
practice this “simple” task might not be so simple after all (as when we conduct
statistical studies in a human population). The sampling topic is discussed in
several books, e.g. (Blom G, 1989) and (Anderson TW, Finn JD, 1996). Examples
of statistical malpractice, namely by poor sampling, can be found in (Jaffe AJ,
Spirer HF, 1987). The sampling issue is part of the planning phase of the statistical
investigation. The reader can find a good explanation of this topic in (Montgomery
DC, 1984) and (Blom G, 1989).
In the case of temporal data a subtler point has to be addressed. Imagine that we
are presented with a list (sequence) of voltage values originated by thermal noise in
an electrical resistance. This sequence should be considered as an instance of a
random process capable of producing an infinite number of such sequences.
Statistics can then be computed either for the ensemble of instances or for the time
sequence of the voltage values. For instance, one could compute a mean voltage
value in two different ways: first, assuming one has available a sample of voltage
sequences randomly drawn from the ensemble, one could compute the mean
voltage value at, say, t = 3 seconds, for all sequences; and, secondly, assuming one
such sequence lasting 10 seconds is available, one could compute the mean voltage
value for the duration of the sequence. In the first case, the sample mean is an
estimate of an ensemble mean (at t = 3 s); in the second case, the sample mean is
an estimate of a temporal mean. Fortunately, in a vast number of situations,
corresponding to what are called ergodic random processes, one can derive
ensemble statistics from temporal statistics, i.e., one can limit the statistical study
to the study of only one time sequence. This applies to the first two examples of
random processes previously mentioned (as a matter of fact, thermal noise and dice
tossing are ergodic processes; Brownian motion is not).

1.3 Random Variables

A random dataset presents the values of random variables. These establish a
mapping between an event domain and some conveniently chosen value domain
(often a subset of ℜ). A good understanding of what the random variables are and
which mappings they represent is a preliminary essential condition in any
statistical analysis. A rigorous definition of a random variable (sometimes
abbreviated to r.v.) can be found in Appendix A.
Usually the value domain of a random variable has a direct correspondence to
the outcomes of a random experiment, but this is not compulsory. Table 1.4 lists
random variables corresponding to the examples of the previous section. Italicised
capital letters are used to represent random variables, sometimes with an
identifying subscript. The Table 1.4 mappings between the event and the value
domain are:

X F: {commerce, industry, services} → {1, 2, 3}.
X E: {bad, mediocre, fair, good, excellent} → {1, 2, 3, 4, 5}.
X R: [90 Ω, 110 Ω] → [90, 110].

24 25 26 27 28 29 30 31 32 33 34