Page 421 - Probability and Statistical Inference
P. 421

398    8. Tests of Hypotheses

                                 where R denotes a generic rejection region for H .
                                                                          0
                                    Example 8.2.2 (Example 8.2.1 Continued) In the case of the Test #1,
                                 writing Z for a standard normal variable, we have:




                                 Proceeding similarly, we used MAPLE to prepare the following table for the
                                 values of α and β associated with the Tests #1-4 given by (8.2.1).
                                         Table 8.2.2. Values of a and ß for Tests #1-4 from (8.2.1)
                                         Test #1      Test #2       Test #3      Test #4
                                           R            R             R            R
                                            1             2            3             4
                                       α = .06681   α = .01696    α = .06681    α = .00000
                                       β = .15866    β = .07865   β = .00000    β = .06681
                                 Upon inspecting the entries in the Table 8.2.2, we can immediately conclude a
                                 few things. Between the Tests #1 and #2, we feel that the Test #2 appears
                                 better because both its error probabilities are smaller than the ones associated
                                 with the Test #1. Comparing the Tests #1 and #3 we can similarly say that
                                 Test #3 performs much better. In other words, while comparing Tests #1-3,
                                 we feel that the Test #1 certainly should not be in the running, but no clear-cut
                                 choice between Tests #2 and #3 emerges from this. One of these has a smaller
                                 value of a but has a larger value of ß. If we must pick between the Tests #2-
                                 3, then we have to take into consideration the consequences of committing
                                 either error in practice. It is clear that an experimenter may not be able to
                                 accomplish this by looking at the values of a and ß alone. Tests #3-4 point out
                                 a slightly different story. By down-sizing the rejection region R for the Test #4
                                 in comparison with that of Test #3, we are able to make the a value for Test
                                 #4 practically zero, but this happens at the expense of a sharp rise in the value
                                 of β. !
                                    From Table 8.2.2, we observe some special features which also hold in
                                 general. We summarize these as follows:
                                         All tests may not be comparable among themselves such
                                         as tests #2-3. By suitably adjusting the rejection region R,
                                         we can make α (or ß) as small as we would like, but then
                                        β (or a) will be on the rise as the sample size n is kept fixed.
                                    So, then how should one proceed to define a test for H  versus H  which
                                                                                   0
                                                                                           1
                                 can be called the “best”? We discuss the Neyman-Pearson formulation of the
                                 testing problem in its generality in the next subsection.
   416   417   418   419   420   421   422   423   424   425   426