Page 106 - Statistics for Dummies
P. 106
90
Part II: Number-Crunching Basics
To find Q and Q you use the steps shown in the section “Calculating per-
3
1
centiles,” with n = 25. Step 1 is done because the data are ordered. For Step
2, since Q is the 25th percentile, multiply 0.25 ∗ 25 = 6.25. This is not a whole
1
number, so Step 3a says to round it up to 7 and proceed to Step 3b.
Following Step 3b, you count from left to right in the data set until you reach
the 7th number, 68; this is Q . For Q (the 75th percentile) you multiply 0.75
3
1
∗ 25 = 18.75, which you round up to 19. The 19th number on the list is 89, so
that’s Q . Putting it all together, the five-number summary for these 25 test
3
scores is 43, 68, 77, 89, and 99. To best interpret a five-number summary, you
can use a boxplot; see Chapter 7 for details.
Exploring interquartile range
The purpose of the five-number summary is to give descriptive statistics for
center, variation, and relative standing all in one shot. The measure of center
in the five-number summary is the median, and the first quartile, median, and
third quartiles are measures of relative standing.
To obtain a measure of variation based on the five-number summary, you can
find what’s called the interquartile range (or IQR). The IQR equals Q – Q (that
3 1
is, the 75th percentile minus the 25th percentile) and reflects the distance
taken up by the innermost 50% of the data. If the IQR is small, you know a lot
of data are close to the median. If the IQR is large, you know the data are more
spread out from the median. The IQR for the test scores data set is 89 – 68 =
21, which is fairly large, seeing as how test scores only go from 0 to 100.
The interquartile range is a much better measure of variation than the regular
range (maximum value minus minimum value; see the section “Being out of
range” earlier in this chapter). That’s because the interquartile range doesn’t
take outliers into account; it cuts them out of the data set by only focusing
on the distance within the middle 50 percent of the data (that is, between the
25th and 75th percentiles).
Descriptive statistics that are well chosen and used correctly can tell you a
great deal about a data set, such as where the center is located, how diverse
the data are, and where a good portion of the data lies. However, descriptive
statistics can’t tell you everything about the data, and in some cases they
can be misleading. Be on the lookout for situations where a different statistic
would be more appropriate (for example, the median describes center more
fairly than the mean when the data is skewed), and keep your eyes peeled for
situations where critical statistics are missing (for example, when a mean is
reported without a corresponding standard deviation).
3/25/11 8:17 PM
10_9780470911082-ch05.indd 90
10_9780470911082-ch05.indd 90 3/25/11 8:17 PM