Page 107 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 107
86 3 Estimating Data Parameters
interval, following the procedure explained in the previous section, is now
computed as:
x − z 1 α 2 / σ / n < µ < x + z 1 α 2 / σ / n . 3.9
−
−
As shown in Figure 3.3, with increasing n, the distribution of X gets more
peaked; therefore, the confidence intervals decrease with n (the precision of our
estimates of the mean increase). This is precisely why computing averages is so
popular!
In normal practice one does not know the exact value of σ, using the previously
mentioned (2.3.2) point estimate s instead. In this case, the sampling distribution is
not the normal distribution any more. However, taking into account Property 3
described in section B.2.8, the following random variable:
X − µ
T n 1 = ,
−
s / n
has a Student’s t distribution with df = n – 1 degrees of freedom. The sample
standard deviation of X , s / n , is known as the standard error of the
statistic x and denoted SE.
’
We now compute the 1−α/2 percentile for the Student s t distribution with
df = n – 1degrees of freedom:
−
T n 1 − (t ) = 1 α 2 / ⇒ t df 1 , α 2 / , 3.10
−
and use this percentile in order to establish the two-sided confidence interval:
x − µ
−t < < t , 3.11
df
2 /
1 , α
−
−
SE df 1 , α 2 /
or, equivalently:
x t − df 1 α 2 / SE < µ < x t + df 1 α 2 / SE . 3.12
−
−
,
,
’
Since the Student s t distribution is less peaked than the normal distribution, one
obtains larger intervals when using formula 3.12 than when using formula 3.9,
reflecting the added uncertainty about the true value of the standard deviation.
When applying these results one must note that:
– For large n, the Central Limit theorem (see sections A.8.4 and A.8.5)
legitimises the assumption of normal distribution of X even when X is not
normally distributed (under very general conditions).
– For large n, the Student s t distribution does not deviate significantly from
’
the normal distribution, and one can then use, for unknown σ, the same
percentiles derived from the normal distribution, as one would use in the
case of known σ.