Page 81 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 81
60 2 Presenting and Summarising the Data
The median satisfies the same linear property as the mean (see A.6.1), but not
the other properties (e.g. additivity). Compared to the mean, the median has the
advantage of being quite insensitive to outliers and extreme cases.
Notice that, if we sort the dataset, the sample median is the central value if the
number of the data values is odd; if it is even, it is computed as the average of the
two most central values.
2.3.1.3 Quantiles
The quantile of order α (0 < α < 1) of a random variable distribution F X (x ) is
defined as the root of the equation (see A.5.2):
F X (x ) = α . 2.9
We denote the root as: x α.
Likewise we compute the quantile of order α of a dataset as the value below
which lies a percentage α of cases of the dataset. The median is therefore the 50%
quantile, or x 0.5. Often used quantiles are:
– Quartiles, corresponding to multiples of 25% of the cases. The box plot
mentioned in 2.2.4 uses the quartiles and the inter-quartile range (IQR) in
order to determine the outliers of the dataset distribution.
– Deciles, corresponding to multiples of 10% of the cases.
– Percentiles, corresponding to multiples of 1% of the cases. We will often
use the percentile p = 2.5% and its complement p = 97.5%.
2.3.1.4 Mode
The mode of a dataset is its maximum value. It is an estimate of the probability or
density function maximum.
For continuous type data one should determine the midpoint of the modal bin of
the data grouped into an appropriate number of bins.
When a data distribution exhibits several relative maxima of almost equal value,
we say that it is a multimodal distribution.
Example 2.5
Q: Consider the Cork Stoppers’ dataset. Determine the measures of location
of the variable PRT. Comment the results. Imagine that we had a new variable,
PRT1, obtained by the following linear transformation of PRT: PRT1 = 0.2 PRT + 5.
Determine the mean and median of PRT1.
A: Table 2.6 shows some measures of location of the variable PRT. Notice that as
a mode estimate we can use the midpoint of the bin [355.3 606.7] as shown in
Figure 2.17, i.e., 481. Notice also the values of the lower and upper quartiles