Page 79 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 79
58 2 Presenting and Summarising the Data
The R boxplot function uses the so-called x~y “formula” to create a box plot of
x grouped by y. The legend function places label as a legend at the (x,y)
position of the plot. The graph of Figure 2.23 (CL is the Class variable) was
obtained with:
> boxplot(ART~CL)
> legend(3.2,100,legend=“CL”)
> legend(0.5,900,legend=“ART”)
2.3 Summarising the Data
When analysing a dataset, one usually starts by determining some indices that give
a global picture on where and how the data is concentrated and what is the shape of
its distribution, i.e., indices that are useful for the purpose of summarising the data.
These indices are known as descriptive statistics.
2.3.1 Measures of Location
Measures of location are used in order to determine where the data distribution is
concentrated. The most usual measures of location are presented next.
Commands 2.7. SPSS, STATISTICA, MATLAB and R commands used to obtain
measures of location.
SPSS Analyze; Descriptive Statistics
STATISTICA Statistics; Basic Statistics/Tables;
Descriptive Statistics
MATLAB mean(x) ; trimmean(x,p) ; median(x) ;
prctile(x,p)
R mean(x, trim) ; median(x) ; summary(x);
quantile(x,seq(...))
2.3.1.1 Arithmetic Mean
Let x 1, …, x n be the data. The arithmetic mean (or simply mean) is:
1 n
x = ∑ x . 2.5
i
n = i 1
The arithmetic mean is the sample estimate of the mean of the associated
random variable (see Appendices B and C). If one has a tally sheet of a discrete