Page 86 -
P. 86
3:15
#11
2011/6/1
Page 49
HAN 09-ch02-039-082-9780123814791
2.2 Basic Statistical Descriptions of Data 49
The quartiles give an indication of a distribution’s center, spread, and shape. The first
quartile, denoted by Q 1 , is the 25th percentile. It cuts off the lowest 25% of the data.
The third quartile, denoted by Q 3 , is the 75th percentile—it cuts off the lowest 75% (or
highest 25%) of the data. The second quartile is the 50th percentile. As the median, it
gives the center of the data distribution.
The distance between the first and third quartiles is a simple measure of spread
that gives the range covered by the middle half of the data. This distance is called the
interquartile range (IQR) and is defined as
IQR = Q 3 − Q 1 . (2.5)
Example 2.10 Interquartile range. The quartiles are the three values that split the sorted data set into
four equal parts. The data of Example 2.6 contain 12 observations, already sorted in
increasing order. Thus, the quartiles for this data are the third, sixth, and ninth val-
ues, respectively, in the sorted list. Therefore, Q 1 = $47,000 and Q 3 is $63,000. Thus,
the interquartile range is IQR = 63 − 47 = $16,000. (Note that the sixth value is a
median, $52,000, although this data set has two medians since the number of data values
is even.)
Five-Number Summary, Boxplots, and Outliers
No single numeric measure of spread (e.g., IQR) is very useful for describing skewed
distributions. Have a look at the symmetric and skewed data distributions of Figure 2.1.
In the symmetric distribution, the median (and other measures of central tendency)
splits the data into equal-size halves. This does not occur for skewed distributions.
Therefore, it is more informative to also provide the two quartiles Q 1 and Q 3 , along
with the median. A common rule of thumb for identifying suspected outliers is to
single out values falling at least 1.5 × IQR above the third quartile or below the first
quartile.
Because Q 1 , the median, and Q 3 together contain no information about the end-
points (e.g., tails) of the data, a fuller summary of the shape of a distribution can be
obtained by providing the lowest and highest data values as well. This is known as
the five-number summary. The five-number summary of a distribution consists of the
median (Q 2 ), the quartiles Q 1 and Q 3 , and the smallest and largest individual obser-
vations, written in the order of Minimum, Q 1 , Median, Q 3 , Maximum.
Boxplots are a popular way of visualizing a distribution. A boxplot incorporates the
five-number summary as follows:
Typically, the ends of the box are at the quartiles so that the box length is the
interquartile range.
The median is marked by a line within the box.
Two lines (called whiskers) outside the box extend to the smallest (Minimum) and
largest (Maximum) observations.