Page 92 - Becoming Metric Wise
P. 92

82    Becoming Metric-Wise


          4.5.3 The Interquartile Range
          The interquartile range (IQR), defined as the difference between the
          third and the first quartile, is a robust measure of dispersion:
                                     ^          ^
                              IQR 5 Q ð0:75Þ 2 Q ð0:25Þ               (4.10)
                                                 n
                                       n
          4.5.4 Skewness

          In Section 4.2.1 we already introduced the term skewness in an intuitive
          way. Now we provide a formula to measure skewness. This expression is
          known as Pearson’s moment coefficient of skewness, in short skewness.
             Skewness, denoted as Sk, is calculated using the following formula
          (m 2 and m 3 denote the second and third moment about the mean; n is
          the number of data):

                       p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi           n
                                                  1  X
                         nðn 2 1Þ m 3                       3
                  Sk 5                  with m 3 5     ð x i 2xÞ
                         n 2 2   ð m 2 Þ 3=2      n  i51
                                n                                     (4.11)
                             1  X
                    and m 2 5     ð x i 2xÞ 2
                             n
                               i51
                       p ffiffiffiffiffiffiffiffiffiffiffi
                         nðn 2 1Þ
             The factor        is used to reduce bias when skewness is calculated
                         n 2 2
          from a sample. If data are left-skewed, skewness is negative and when it is
          right-skewed it is positive. If a distribution is symmetric, or when mean
          and median coincide, then the skewness coefficient is zero, but the oppo-
          site does not hold: zero skewness does not imply symmetry or that the
          mean is equal to the median. Formula (4.10) was used in Rousseau
          (2014b) to measure skewness in journal citations.


          4.6 THE BOXPLOT
          4.6.1 The Five-Number Summary

          The five-number summary of a sequence of data consists of the smallest
          observation, the first quartile, the median (5second quartile), the third
          quartile and the largest observation. These five numbers provide a sum-
          mary of the statistical characteristics of a sequence of data.

          4.6.2 Boxplots
          A boxplot is a convenient way of graphically depicting the five-number
          summary, and may even provide more information. It consists of a box
   87   88   89   90   91   92   93   94   95   96   97