Page 82 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 82

2.3 Summarising the Data   61


           delimiting 50% of the cases. The large deviation of the 95% percentile from the
           upper quartile, when compared to the deviation of the 5% percentile from the lower
           quartile, is evidence of a right skewed asymmetrical distribution.
              By the linear properties of the mean and the median, we have:

              Mean(PRT1)    = 0.2 Mean(PRT) + 5    = 147;
              Median(PRT1) = 0.2 Median(PRT) + 5 = 131.


           Table 2.6. Location measures (computed with STATISTICA) for variable PRT of
           the cork stopper dataset (150 cases).
                                     Lower      Upper     Percentile   Percentile
               Mean      Median
                                    Quartile   Quartile     5%         95%
             710.3867   629.0000    410.0000   974.0000   246.0000   1400.000


              An important aspect to  be considered,  when  using values computed  with
           statistical software, is the precision  of the results expressed by the  number of
           significant digits. Almost every software product will produce results with a large
           number of digits, independent of  whether or not they  mean something. For
           instance, in the case of the PRT variable (Table 2.6) it would be foolish to publish
           that the mean of the total perimeter of the defects of the cork stoppers is 710.3867.
           First of all, the least significant digit is, in this case, the unit (no perimeter can be
           measured in fractions of the pixel unit; see Appendix E). Thus, one would have to
           publish a  value rounded  up to the  units, in this case 710. Second, there are
           omnipresent  measurement  errors that must be accounted for. Assuming that the
                                                                   3
           perimeter measurement error is of one unit, then the mean is 710 ± 1 . As a matter
           of fact, even this one unit precision for the mean is somewhat misleading, as we
           will see in the following chapter. From now on the published results will take this
           issue into consideration and may,  therefore, appropriately round the results
           obtained with the software products.
              The R functions also provide a large number of digits, as when calculating the
           mean of PRT:

              > mean(PRT)
              [1] 710.3867

              However, the  summary   function provides a reasonable rounding:

              > summary(PRT)
                 Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
                104.0   412.0   629.0   710.4   968.5  1612.0

           3
             Denoting by ∆x a single data measurement error, the mean of n measurements has an error
             of ±(n.abs(∆x))/n = ±∆x in the worst case.
   77   78   79   80   81   82   83   84   85   86   87