Page 22 - Statistics for Environmental Engineers
P. 22

L1592_Frame_C02  Page 13  Tuesday, December 18, 2001  1:40 PM









                                           (a) Tests on same       • • ••
                                                                     •
                                           specimen
                                           (b) Tests on        •  •     •  •  •
                                           different specimens
                                           from the same batch  7.0                  8.0                  9.0

                       FIGURE 2.6  Repeated tests from (a) a single specimen that reflect variation in the analytical measurement method and
                       (b) five specimens from a single batch that reflect variation due to collecting the test specimens and the measurement method.

                             If we wish to compare two testing methods A and B, the correct basis is to compare five
                           determinations made using test method A with five determinations using test method B with all
                           tests made on portions of the  same test specimen.  These two sets of measurements are not
                           influenced by variation between test specimens or by the method of collection.
                             If we wish to compare two sampling methods, the correct basis is to compare five determina-
                           tions made on five different specimens collected using sampling method A with those made on
                           five specimens using sampling method B, with all specimens coming from the same batch. These
                           two sets of data will contain variation due to the collection of the specimens and the testing
                           method. They do not contain variation due to differences between batches.
                             If the goal is to compare two different processes for making a product, the observations used
                           as a basis for comparison should reflect variation due to differences between batches taken from
                           the two processes.



                       Normality, Randomness, and Independence

                       The three important properties on which many statistical procedures rest are normality, randomness, and
                       independence. Of these, normality is the one that seems to worry people the most. It is not always the
                       most important.
                        Normality means that the error term in a measurement, e i , is assumed to come from a normal probability
                       distribution.  This is the  familiar symmetrical bell-shaped distribution.  There is a tendency for error
                       distributions that result from many additive component errors to be “normal-like.” This is the central
                       limit effect. It rests on the assumption that there are several sources of error, that no single source
                       dominates, and that the overall error is a linear combination of independently distributed errors. These
                       conditions seem very restrictive, but they often (but not always) exist. Even when they do not exist, lack
                       of normality is not necessarily a serious problem. Transformations are available to make nonnormal
                       errors “normal-like.”
                        Many commonly used statistical procedures, including those that rely directly on comparing averages
                       (such as t tests to compare two averages and analysis of variance tests to compare several averages) are
                       robust to deviations from normality. Robust means that the test tends to yield correct conclusions even
                       when applied to data that are not normally distributed.
                        Random means that the observations are drawn from a population in a way that gives every element
                       of the population an equal chance of being drawn. Randomization of sampling is the best form of
                       insurance that observations will be independent.


                       Example 2.6

                           Errors in the nitrate laboratory data were checked for randomness by plotting the errors, e i  = y i  − η.
                           If the errors are random, the plot will show no pattern. Figure 2.7 is such a plot, showing e i  in order
                           of observation. The plot does not suggest any reason to believe the errors are not random.
                       Imagine ways in which the errors of the nitrate measurements might be nonrandom. Suppose, for example,
                       that the measurement process drifted such that early measurements tended to be high and later measurements
                       low. A plot of the errors against time of analysis would show a trend (positive errors followed by negative
                       © 2002 By CRC Press LLC
   17   18   19   20   21   22   23   24   25   26   27