Page 99 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 99
78 2 Presenting and Summarising the Data
2.5 Determine the histograms of variables LB, ASTV, MSTV, ALTV and MLTV of the
CTG dataset using Sturges’ rule for the number of bins. Compute the skewness and
kurtosis of the variables and check the following statements:
a) The distribution of LB is well modelled by the normal distribution.
b) The distribution of ASTV is symmetric, bimodal and flatter than the normal
distribution.
c) The distribution of ALTV is left skewed and more peaked than the normal
distribution.
2.6 Taking into account the values of the skewness and kurtosis computed for variables
ASTV and ALTV in the previous Exercise, which distributions should be selected as
candidates for modelling these variables (see Figure 2.24)?
2.7 Consider the bacterial counts in three organs – the spleen, liver and lungs - included in
the Cells dataset (datasheet CF U ). Using box plots, compare the cell counts in the
three organs 2 weeks and 2 months after infection. Also, determine which organs have
the lowest and highest spread of bacterial counts.
2.8 The inter-quartile ranges of the bacterial counts in the spleen and in the liver after 2
weeks have similar values. However, the range of the bacterial counts is much smaller
in the spleen than in the liver. Explain what causes this discrepancy and comment on
the value of the range as spread measure.
2.9 Determine the overlaid scatter plot of the three types of clays (Clays’ dataset), using
variables SiO 2 and Al 2 O 3 . Also, determine the correlation between both variables and
comment on the results.
2.10 The Moulds’ dataset contains measurements of bottle bottoms performed by three
methods. Determine the correlation matrix for the three methods before and after
subtracting the nominal value of 34 mm and explain why the same correlation results
are obtained. Also, express your judgement on the measurement methods taking into
account their low correlation.
2.11 The Culture dataset contains percentages of budget assigned to cultural activities in
several Portuguese boroughs randomly sampled from three regions, coded 1, 2 and 3.
Determine the correlations among the several cultural activities and consider them to be
significant if they are higher than 0.4. Comment on the following statements:
a) The high negative correlation between “Halls” and “Sport” is due to chance alone.
b) Whenever there is a good investment in “Cine”, there is also a good investment
either in “Music” or in “Fine Arts”.
c) In the northern boroughs, a high investment in “Heritage” causes a low investment
in “Sport”.
2.12 Consider the “Halls” variable of the Culture dataset:
a) Determine the overall frequency table and histogram, starting at zero and with bin
width 0.02.
b) Determine the mean and median. Which of these statistics should be used as
location measure and why?