Page 240 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 240
Exercises 221
5.23 Run the non-parametric counterparts of the tests used in Exercises 4.9, 4.10 and 4.20.
Compare the results and the power of the tests with those obtained using parametric
tests.
5.24 Using appropriate non-parametric tests, determine which variables of the Wines’
dataset are most discriminative of the white from the red wines.
5.25 The Neonatal dataset contains mortality data for delivery taking place at home (MH)
and at a Health Centre (MI). Assess whether there are significant differences at 5%
level between delivery conditions, using the sign and the Wilcoxon tests.
5.26 Consider the Firms’ dataset containing productivity figures (P) for a sample of
Portuguese firms in four branches of activity (BRANCH). Study the dataset in order to:
a) Assess with 5% level of significance whether there are significant differences
among the productivity medians of the four branches.
b) Assess with 1% level of significance whether Commerce and Industry have
significantly different medians.
5.27 Apply the appropriate non-parametric test in order to rank the discriminative capability
of the features used to characterise the tissue types in the Breast Tissue dataset.
5.28 Redo the previous Exercise 5.27 for the CTG dataset and the three-class discrimination
expressed by the grouping variable NSP.
5.29 Consider the discrimination of the three clay types based on the sample data of the
Clays’ dataset. Show that the null hypothesis of equal medians for the three clay
types is:
a) Rejected with more than 95% confidence for all grading variables (LG, MG, HG).
b) Not rejected for the iron oxide features.
c) Rejected with higher confidence for the lime (CaO) than for the silica (SiO 2 ).
5.30 The FHR dataset contains measurements of basal heart rate performed by three human
experts and an automatic diagnostic system. Assess whether the null hypothesis of
equal median measurements can be accepted with 5% significance for the three human
experts and the automatic diagnostic system.
5.31 When analysing the contents of questions Q4, Q5, Q6 and Q7, someone said that “these
questions are essentially evaluating the same thing”. Assess whether this statement can
be accepted at a 5% significance level. Compute the coefficient of agreement κ and
discuss its significance.
5.32 The Programming dataset contains results of an enquiry regarding freshman
previous knowledge on programming (PROG), Boole’s Algebra (AB), binary
arithmetic (BA) and computer hardware (H). Consider the variables PROG, AB, BA
and H dichotomised in a “yes/no” fashion. Can one reject with 99% confidence the
hypothesis that the four dichotomised variables essentially evaluate the same thing?