Page 84 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 84

2.3 Summarising the Data   63


              The sample variance is the  point estimate of the associated random variable
           variance (see  Appendices B and C). It can be interpreted as the mean square
           deviation (or mean square error, MSE) of the sample values from their mean. The
           use of  the  n – 1 factor, instead  of  n as in the  usual computation of  a mean, is
           explained in C.2. Notice also that given x , only n – 1 cases can vary independently
           in order to achieve the same variance. We say that the variance has df  =  n –  1
           degrees of freedom. The mean, on the other hand, has n degrees of freedom.


           2.3.2.4 Standard Deviation
           The standard deviation of a dataset is the root square of its variance. It is, therefore,
           a root mean square error (RMSE):

                                    n
              s  = v  = [ ∑ n  (x  −  ) x  2  /( −  ] ) 1  2 / 1  .        2.13
                          1 = i  i

              The standard deviation is preferable than the variance as a measure of spread,
           since it is expressed in the  same units as the original  data. Furthermore, many
           interesting results about the spread of a distribution are expressed in terms of the
           standard  deviation. For instance,  for any random variable  X, the  Chebyshev
           Theorem tall us that (see A.6.3):

                              1
              P ( X − µ >  kσ ) ≤  .
                             k  2

              Using  s as  point estimate  of  σ,  we can then expect  that for any  dataset
           distribution at least 75 % of the cases lie within 2 standard deviations of the mean.

           Example 2.6
           Q: Consider the Cork Stoppers’ dataset. Determine the measures of spread of
           the variable PRT. Imagine that we had a  new  variable,  PRT1,  obtained by the
           following linear transformation  of PRT: PRT1 = 0.2 PRT + 5. Determine the
           variance of PRT1.
           A: Table 2.7 shows measures of spread of the variable PRT. The sample variance
           enjoys the same linear transformation property as the true variance (see A.6.1). For
           the PRT1 variable we have:

                                2
              variance(PRT1) = (0.2)  variance(PRT) = 5219.

              Note that the  addition  of a  constant to PRT (i.e., a  scale translation)  has  no
           effect on the variance.
   79   80   81   82   83   84   85   86   87   88   89