Page 81 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 81

60       2 Presenting and Summarising the Data


              The median satisfies the same linear property as the mean (see A.6.1), but not
           the other properties (e.g. additivity). Compared to the mean, the median has the
           advantage of being quite insensitive to outliers and extreme cases.
              Notice that, if we sort the dataset, the sample median is the central value if the
           number of the data values is odd; if it is even, it is computed as the average of the
           two most central values.

           2.3.1.3 Quantiles

           The quantile of order α (0 < α  < 1) of a random variable distribution  F X  (x )  is
           defined as the root of the equation (see A.5.2):

              F X  (x )  =  α .                                             2.9

              We denote the root as: x α.
              Likewise we compute the quantile of order α of a dataset as the value below
           which lies a percentage α of cases of the dataset. The median is therefore the 50%
           quantile, or x 0.5. Often used quantiles are:

              –  Quartiles, corresponding to  multiples of  25%  of the cases. The box plot
                 mentioned in 2.2.4 uses the quartiles and the inter-quartile range (IQR) in
                 order to determine the outliers of the dataset distribution.
              –  Deciles, corresponding to multiples of 10% of the cases.
              –  Percentiles, corresponding to multiples of 1% of the cases. We will often
                 use the percentile p = 2.5% and its complement p = 97.5%.

           2.3.1.4 Mode

           The mode of a dataset is its maximum value. It is an estimate of the probability or
           density function maximum.
              For continuous type data one should determine the midpoint of the modal bin of
           the data grouped into an appropriate number of bins.
              When a data distribution exhibits several relative maxima of almost equal value,
           we say that it is a multimodal distribution.

           Example 2.5

           Q: Consider the Cork Stoppers’ dataset. Determine the measures of location
           of the variable PRT. Comment the results. Imagine that we had a new variable,
           PRT1, obtained by the following linear transformation of PRT: PRT1 = 0.2 PRT + 5.
           Determine the mean and median of PRT1.
           A: Table 2.6 shows some measures of location of the variable PRT. Notice that as
           a mode estimate we can use the midpoint of the bin [355.3 606.7] as shown in
           Figure  2.17, i.e.,  481.  Notice also the values  of the lower and  upper quartiles
   76   77   78   79   80   81   82   83   84   85   86