Page 79 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 79

58       2 Presenting and Summarising the Data


           The R boxplot function uses the so-called x~y “formula” to create a box plot of
           x grouped by  y. The  legend function  places  label as a legend at the (x,y)
           position of the plot. The graph  of  Figure 2.23  (CL is the Class variable) was
           obtained with:

              > boxplot(ART~CL)
              > legend(3.2,100,legend=“CL”)
              > legend(0.5,900,legend=“ART”)



           2.3  Summarising the Data

           When analysing a dataset, one usually starts by determining some indices that give
           a global picture on where and how the data is concentrated and what is the shape of
           its distribution, i.e., indices that are useful for the purpose of summarising the data.
           These indices are known as descriptive statistics.



           2.3.1 Measures of Location
           Measures of location are used in order to determine where the data distribution is
           concentrated. The most usual measures of location are presented next.


           Commands 2.7. SPSS, STATISTICA, MATLAB and R commands used to obtain
           measures of location.

             SPSS          Analyze; Descriptive Statistics

             STATISTICA    Statistics; Basic Statistics/Tables;
                           Descriptive Statistics
             MATLAB        mean(x)   ;  trimmean(x,p) ; median(x)    ;
                           prctile(x,p)
             R             mean(x, trim)    ;  median(x)   ;  summary(x);
                           quantile(x,seq(...))


           2.3.1.1  Arithmetic Mean

           Let x 1, …, x n be the data. The arithmetic mean (or simply mean) is:

                 1   n
              x  =  ∑  x .                                                  2.5
                        i
                  n   = i 1

              The arithmetic mean is the sample estimate of the mean  of the associated
           random variable (see Appendices B and C). If one has a tally sheet of a discrete
   74   75   76   77   78   79   80   81   82   83   84