Page 110 - Modern Analytical Chemistry
P. 110
1400-CH04 9/8/99 3:55 PM Page 93
Chapter 4 Evaluating Analytical Data 93
SOLUTION
This is an example of a paired data set since the acquisition of samples over an
extended period introduces a substantial time-dependent change in the
concentration of monensin. The comparison of the two methods must be done
with the paired t-test, using the following null and two-tailed alternative
hypotheses
– –
H 0 : d =0 H A: d ≠ 0
Defining the difference between the methods as
d = X elect – X micro
we can calculate the difference for each sample
Sample 1 2 3 4 5 6 7 8 9 10 11
d 2.8 1.4 –3.0 6.0 –6.6 –0.5 9.7 12.7 –1.6 4.0 –0.2
The mean and standard deviation for the differences are 2.25 and 5.63,
respectively. The test statistic is
dn . 225 11
.
t exp = = =133
s d . 563
which is smaller than the critical value of 2.23 for t(0.05, 10). Thus, the null
hypothesis is retained, and there is no evidence that the two methods yield
different results at the stated significance level.
A paired t-test can only be applied when the individual differences, d i , belong
to the same population. This will only be true if the determinate and indeterminate
errors affecting the results are independent of the concentration of analyte in the
samples. If this is not the case, a single sample with a larger error could result in a
value of d i that is substantially larger than that for the remaining samples. Including
–
this sample in the calculation of d and s d leads to a biased estimate of the true mean
and standard deviation. For samples that span a limited range of analyte concentra-
tions, such as that in Example 4.21, this is rarely a problem. When paired data span
a wide range of concentrations, however, the magnitude of the determinate and in-
determinate sources of error may not be independent of the analyte’s concentra-
tion. In such cases the paired t-test may give misleading results since the paired data
–
with the largest absolute determinate and indeterminate errors will dominate d. In
this situation a comparison is best made using a linear regression, details of which
are discussed in the next chapter.
4 5 Outliers
F.
On occasion, a data set appears to be skewed by the presence of one or more data
points that are not consistent with the remaining data points. Such values are called
outliers. The most commonly used significance test for identifying outliers is Dixon’s outlier
Q-test. The null hypothesis is that the apparent outlier is taken from the same popula- Data point whose value is much larger or
tion as the remaining data. The alternative hypothesis is that the outlier comes from a smaller than the remaining data.
different population, and, therefore, should be excluded from consideration.
The Q-test compares the difference between the suspected outlier and its near- Dixon’s Q-test
Statistical test for deciding if an outlier
est numerical neighbor to the range of the entire data set. Data are ranked from
can be removed from a set of data.
smallest to largest so that the suspected outlier is either the first or the last data