Page 100 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 100
Exercises 79
2.13 Determine the box plots of the Breast Tissue variables I0 through PERIM, for the
6 classes of breast tissue. By visual inspection of the results, organise a table describing
which class discriminations can be expected to be well accomplished by each variable.
2.14 Consider the two variables MH = “neonatal mortality rate at home” and MI = “neonatal
mortality rate at Health Centre” of the Neonatal dataset. Determine the histograms
and compare both variables according to the skewness and kurtosis.
2.15 Determine the scatter plot and correlation coefficient of the MH and MI variables of the
previous exercise. Comment on the results.
2.16 Determine the histograms, skewness and kurtosis of the BPD, CP and AP variables of
the Foetal W eight dataset. Which variable is better suited to normal modelling?
Why?
2.17 Determine the correlation matrix of the BPD, CP and AP variables of the previous
exercise. Comment on the results.
2.18 Determine the correlation between variables I0 and HFS of the Breast Tissue
dataset. Check with the scatter plot that the very low correlation of those two variables
does not mean that there is no relation between them. Compute the new variable I0S =
2
(I0 – 1235) and show that there is a significant correlation between this new variable
and HFS.
2.19 Perform the following statistical analyses on the Rocks’ dataset:
a) Determine the histograms, skewness and kurtosis of the variables and categorise
them into the following categories: left asymmetric; right asymmetric; symmetric;
symmetric and almost normal.
b) Compute the correlation matrix for the mechanical test variables and comment on
the high correlations between RMCS and RCSG and between AAPN and PAOA.
c) Compute the correlation matrix for the chemical composition variables and
determine which variables have higher positive and negative correlation with
silica (SiO 2 ) and which variable has higher positive correlation with titanium
oxide (TiO 2 ).
2.20 The student performance in a first-year university course on Programming can be partly
explained by previous knowledge on such matter. In order to assess this statement, use
the SCORE and PROG variables of the Programming dataset, where the first
variable represents the final examination score on Programming (in [0, 20]) and the
second variable categorises the previous knowledge. Using three SCORE categories
– Poor, if SCORE<10, Fair if 10 ≤SCORE< 15, and Good if SCORE≥ 15 −, determine:
a) The Spearman correlation between the two variables.
b) The contingency table of the two variables.
c) The gamma statistic.
2.21 Show examples of 2×2 contingency tables for nominal data corresponding to φ = 1, −1,
0 and to λ, λ rc and λ cr = 1 and 0.