Page 29 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 29

8        1 Introduction


           This can be done, for instance, with the help of a random number generator. In
           practice this “simple” task might not be so simple after all (as when we conduct
           statistical studies in a  human  population). The  sampling topic is  discussed in
           several books, e.g. (Blom G, 1989) and (Anderson TW, Finn JD, 1996). Examples
           of statistical malpractice, namely by poor sampling, can  be  found in (Jaffe AJ,
           Spirer HF, 1987). The sampling issue is part of the planning phase of the statistical
           investigation. The reader can find a good explanation of this topic in (Montgomery
           DC, 1984) and (Blom G, 1989).
              In the case of temporal data a subtler point has to be addressed. Imagine that we
           are presented with a list (sequence) of voltage values originated by thermal noise in
           an electrical  resistance. This sequence should  be considered as an instance  of a
           random process capable of producing an infinite number  of such sequences.
           Statistics can then be computed either for the ensemble of instances or for the time
           sequence of the voltage values. For instance, one could compute a mean voltage
           value in two different ways: first, assuming one has available a sample of voltage
           sequences randomly drawn  from the ensemble, one could compute the mean
           voltage value at, say, t = 3 seconds, for all sequences; and, secondly, assuming one
           such sequence lasting 10 seconds is available, one could compute the mean voltage
           value  for the  duration of the sequence. In  the first case, the sample  mean is an
           estimate of an ensemble mean (at t = 3 s); in the second case, the sample mean is
           an estimate of a  temporal mean. Fortunately, in a vast number of situations,
           corresponding to  what are called  ergodic random processes, one  can derive
           ensemble statistics from temporal statistics, i.e., one can limit the statistical study
           to the study of only one time sequence. This applies to the first two examples of
           random processes previously mentioned (as a matter of fact, thermal noise and dice
           tossing are ergodic processes; Brownian motion is not).


           1.3 Random Variables


           A random dataset presents the  values  of  random variables. These establish a
           mapping between an event domain and some conveniently chosen value domain
           (often a subset of ℜ). A good understanding of what the random variables are and
           which mappings they represent is a preliminary essential condition in any
           statistical analysis. A rigorous definition of a random variable (sometimes
           abbreviated to r.v.) can be found in Appendix A.
              Usually the value domain of a random variable has a direct correspondence to
           the outcomes of a random experiment, but this is not compulsory. Table 1.4 lists
           random variables corresponding to the examples of the previous section. Italicised
           capital letters are  used to represent  random variables, sometimes with an
           identifying subscript. The Table 1.4 mappings  between the event and the value
           domain are:

              X F:  {commerce, industry, services}  →  {1, 2, 3}.
              X E:  {bad, mediocre, fair, good, excellent}  →  {1, 2, 3, 4, 5}.
              X R:  [90 Ω, 110 Ω]  →  [90, 110].
   24   25   26   27   28   29   30   31   32   33   34