Page 82 -
P. 82
3:15 Page 45
09-ch02-039-082-9780123814791
HAN
2011/6/1
#7
2.2 Basic Statistical Descriptions of Data 45
packages include bar charts, pie charts, and line graphs. Other popular displays of data
summaries and distributions include quantile plots, quantile–quantile plots, histograms,
and scatter plots.
2.2.1 Measuring the Central Tendency: Mean, Median, and Mode
In this section, we look at various ways to measure the central tendency of data. Suppose
that we have some attribute X, like salary, which has been recorded for a set of objects.
Let x 1 ,x 2 ,...,x N be the set of N observed values or observations for X. Here, these val-
ues may also be referred to as the data set (for X). If we were to plot the observations
for salary, where would most of the values fall? This gives us an idea of the central ten-
dency of the data. Measures of central tendency include the mean, median, mode, and
midrange.
The most common and effective numeric measure of the “center” of a set of data is
the (arithmetic) mean. Let x 1 ,x 2 ,...,x N be a set of N values or observations, such as for
some numeric attribute X, like salary. The mean of this set of values is
N
X
x i
i=1 x 1 + x 2 + ··· + x N
¯ x = = . (2.1)
N N
This corresponds to the built-in aggregate function, average (avg() in SQL), provided in
relational database systems.
Example 2.6 Mean. Suppose we have the following values for salary (in thousands of dollars), shown
in increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. Using Eq. (2.1), we have
30 + 36 + 47 + 50 + 52 + 52 + 56 + 60 + 63 + 70 + 70 + 110
¯ x =
12
696
= = 58.
12
Thus, the mean salary is $58,000.
Sometimes, each value x i in a set may be associated with a weight w i for i = 1,...,N.
The weights reflect the significance, importance, or occurrence frequency attached to
their respective values. In this case, we can compute
N
X
w i x i
i=1 w 1 x 1 + w 2 x 2 + ··· + w N x N
¯ x = = . (2.2)
N
X w 1 + w 2 + ··· + w N
w i
i=1
This is called the weighted arithmetic mean or the weighted average.