Page 82 -
P. 82

3:15 Page 45
                                09-ch02-039-082-9780123814791
                          HAN
                                                             2011/6/1
                                                                                    #7
                                                              2.2 Basic Statistical Descriptions of Data  45


                               packages include bar charts, pie charts, and line graphs. Other popular displays of data
                               summaries and distributions include quantile plots, quantile–quantile plots, histograms,
                               and scatter plots.


                         2.2.1 Measuring the Central Tendency: Mean, Median, and Mode
                               In this section, we look at various ways to measure the central tendency of data. Suppose
                               that we have some attribute X, like salary, which has been recorded for a set of objects.
                               Let x 1 ,x 2 ,...,x N be the set of N observed values or observations for X. Here, these val-
                               ues may also be referred to as the data set (for X). If we were to plot the observations
                               for salary, where would most of the values fall? This gives us an idea of the central ten-
                               dency of the data. Measures of central tendency include the mean, median, mode, and
                               midrange.
                                 The most common and effective numeric measure of the “center” of a set of data is
                               the (arithmetic) mean. Let x 1 ,x 2 ,...,x N be a set of N values or observations, such as for
                               some numeric attribute X, like salary. The mean of this set of values is
                                                         N
                                                        X
                                                           x i
                                                        i=1    x 1 + x 2 + ··· + x N
                                                    ¯ x =    =                .                 (2.1)
                                                         N            N
                               This corresponds to the built-in aggregate function, average (avg() in SQL), provided in
                               relational database systems.

                  Example 2.6 Mean. Suppose we have the following values for salary (in thousands of dollars), shown
                               in increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. Using Eq. (2.1), we have
                                           30 + 36 + 47 + 50 + 52 + 52 + 56 + 60 + 63 + 70 + 70 + 110
                                       ¯ x =
                                                                  12
                                           696
                                         =    = 58.
                                           12
                               Thus, the mean salary is $58,000.

                                 Sometimes, each value x i in a set may be associated with a weight w i for i = 1,...,N.
                               The weights reflect the significance, importance, or occurrence frequency attached to
                               their respective values. In this case, we can compute
                                                    N
                                                   X
                                                       w i x i
                                                    i=1      w 1 x 1 + w 2 x 2 + ··· + w N x N
                                                ¯ x =      =                      .             (2.2)
                                                     N
                                                    X           w 1 + w 2 + ··· + w N
                                                       w i
                                                    i=1
                               This is called the weighted arithmetic mean or the weighted average.
   77   78   79   80   81   82   83   84   85   86   87