Page 26 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 26

1.2 Population, Sample and Statistics   5


              Therefore, the next number in the “random number” sequence is obtained by
           computing the remainder of the integer division of α times the previous number by
           a suitable constant, m. In order to obtain a convenient “random-like” behaviour of
           this purely deterministic sequence, when using numbers represented with p binary
                                 p
           digits, one must use  m = 2 and α =  2  /p   2  +  3 , where  /   2  is the nearest integer
                                                         p
           smaller than  p/2. The  periodicity of the sequence is  then  2 p − 2  . Figure 1.4
           illustrates one such sequence.


                     1200
                          x n
                     1000
                      800
                      600
                      400
                      200
                       0                                         n
                         0   10  20  30  40   50  60  70  80  90  100
                                                                            p
           Figure 1.4. “Random number” sequence using  p =10 binary digits with m = 2 =
                                          p
           1024, α =35 and initial value x(0) = 2  – 3 = 1021.

           1.2 Population, Sample and Statistics


           When studying a collection of data as a random dataset, the basic assumption being
           that no law explains any individual value of the dataset, we attempt to study the
           data by means of some global measures, known as statistics, such as frequencies
           (of data occurrence in specified intervals), means, standard deviations, etc.
              Clearly, these same measures can be applied to a deterministic dataset, but, after
           all, the mean height value in a set of height measurements of a falling body, among
           other things, is irrelevant.
              Statistics had  its beginnings and  key  developments during the last century,
           especially the last seventy years. The need to compare datasets and to infer from a
           dataset the process that generated it, were and still are important issues addressed
           by statisticians, who have made a  definite contribution to forwarding  scientific
           knowledge in many disciplines (see e.g. Salsburg D, 2001). In an inferential study,
           from a dataset to the process that generated it, the statistician considers the dataset
           as a  sample from a vast, possibly infinite, collection of  data called  population.
           Each individual item of a sample is a case (or object). The sample itself is a list of
           values of one or more random variables.
              The  population  data is usually not available for study,  since most often it is
           either infinite or finite but very costly to collect. The data sample, obtained from
           the population, should be randomly drawn, i.e., any individual in the population is
           supposed to have an equal chance of being part of the sample. Only by studying
   21   22   23   24   25   26   27   28   29   30   31