Page 80 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 80

2.3 Summarising the Data   59


           type data, one can also compute the mean using the absolute frequencies (counts),
           n k, of each distinct value x k:

                 1   n                   n
              x  =  ∑   n k  x k  with  n  = ∑  n .                         2.6
                                            k
                 n   k =1                k =1

              If  one  has a  frequency table of a continuous type  data (also known in some
           literature as grouped data), with r bins, one can obtain an estimate of x , using the
           frequencies f j of the bins and the mid-bin values,  x & , as follows:
                                                     j

              x ˆ  =  1  ∑ n  f  x & .                                      2.7
                 r    = j 1  j  j

              This mean estimate used to be presented as an expedite way of calculating the
           arithmetic mean for long tables of data. With the advent of statistical software the
           interest of such a method is at least questionable. We will proceed no further with
           such a “grouped data” approach.
              Sometimes, when in presence of datasets exhibiting outliers and extreme cases
           (see 2.2.4) that can be suspected to be the result of rough measurement errors, one
           can use a trimmed mean by neglecting a certain percentage of the tail cases (e.g.,
           5%).
              The arithmetic mean is a point estimate of the expected value (true mean) of the
           random variable associated to the data and has the same properties as the true mean
           (see A.6.1). Note that the expected value can be interpreted as the center of gravity
           of a weightless rod with probability mass-points, in the case of discrete variables,
           or of a rod whose mass-density corresponds to the probability density function, in
           the case of continuous variables.


           2.3.1.2  Median

           The median of a dataset is that value of the data below which lie 50% of the cases.
           It is an estimate of the median, med(X), of the random variable, X, associated to the
           data, defined as:

                      1
              F X  (x ) =  ⇒  med (X  ) ,                                   2.8
                      2

           where  F X  (x ) is the distribution function of X.
              Note that, using the previous rod analogy for the continuous variable case, the
           median divides the rod into equal mass halves corresponding to equal areas under
           the density curve:

                            ∞
              ∫ − med(X  )  f X  (x ) =  ∫ med(X )  f  X  (x ) =  1  .
                ∞
                                        2
   75   76   77   78   79   80   81   82   83   84   85