Page 84 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 84
2.3 Summarising the Data 63
The sample variance is the point estimate of the associated random variable
variance (see Appendices B and C). It can be interpreted as the mean square
deviation (or mean square error, MSE) of the sample values from their mean. The
use of the n – 1 factor, instead of n as in the usual computation of a mean, is
explained in C.2. Notice also that given x , only n – 1 cases can vary independently
in order to achieve the same variance. We say that the variance has df = n – 1
degrees of freedom. The mean, on the other hand, has n degrees of freedom.
2.3.2.4 Standard Deviation
The standard deviation of a dataset is the root square of its variance. It is, therefore,
a root mean square error (RMSE):
n
s = v = [ ∑ n (x − ) x 2 /( − ] ) 1 2 / 1 . 2.13
1 = i i
The standard deviation is preferable than the variance as a measure of spread,
since it is expressed in the same units as the original data. Furthermore, many
interesting results about the spread of a distribution are expressed in terms of the
standard deviation. For instance, for any random variable X, the Chebyshev
Theorem tall us that (see A.6.3):
1
P ( X − µ > kσ ) ≤ .
k 2
Using s as point estimate of σ, we can then expect that for any dataset
distribution at least 75 % of the cases lie within 2 standard deviations of the mean.
Example 2.6
Q: Consider the Cork Stoppers’ dataset. Determine the measures of spread of
the variable PRT. Imagine that we had a new variable, PRT1, obtained by the
following linear transformation of PRT: PRT1 = 0.2 PRT + 5. Determine the
variance of PRT1.
A: Table 2.7 shows measures of spread of the variable PRT. The sample variance
enjoys the same linear transformation property as the true variance (see A.6.1). For
the PRT1 variable we have:
2
variance(PRT1) = (0.2) variance(PRT) = 5219.
Note that the addition of a constant to PRT (i.e., a scale translation) has no
effect on the variance.