Page 99 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 99

78       2 Presenting and Summarising the Data


           2.5  Determine the histograms of variables LB, ASTV, MSTV, ALTV and MLTV of the
               CTG dataset using Sturges’ rule for the number of bins. Compute the skewness and
               kurtosis of the variables and check the following statements:
               a)  The distribution of LB is well modelled by the normal distribution.
               b)  The distribution of ASTV is symmetric, bimodal and flatter  than the normal
                   distribution.
               c)  The distribution of ALTV is left skewed and more peaked  than the normal
                   distribution.

           2.6  Taking into account the values of the skewness and kurtosis computed for variables
               ASTV and ALTV in the previous Exercise, which distributions should be selected as
               candidates for modelling these variables (see Figure 2.24)?

           2.7  Consider the bacterial counts in three organs – the spleen, liver and lungs - included in
               the Cells   dataset (datasheet CF U  ). Using box plots, compare the cell counts in the
               three organs 2 weeks and 2 months after infection. Also, determine which organs have
               the lowest and highest spread of bacterial counts.

           2.8  The inter-quartile ranges of the bacterial counts in the spleen and in the liver after 2
               weeks have similar values. However, the range of the bacterial counts is much smaller
               in the spleen than in the liver. Explain what causes this discrepancy and comment on
               the value of the range as spread measure.

           2.9  Determine the overlaid scatter plot of the three types of clays (Clays’ dataset), using
               variables SiO 2  and Al 2 O 3 . Also, determine the correlation between both variables and
               comment on the results.

           2.10 The Moulds’ dataset contains measurements of bottle bottoms performed by three
               methods. Determine the correlation matrix  for the three methods before  and after
               subtracting the nominal value of 34 mm and explain why the same correlation results
               are obtained. Also, express your judgement on the measurement methods taking into
               account their low correlation.

           2.11 The Culture   dataset contains percentages of budget assigned to cultural activities in
               several Portuguese boroughs randomly sampled from three regions, coded 1, 2 and 3.
               Determine the correlations among the several cultural activities and consider them to be
               significant if they are higher than 0.4. Comment on the following statements:
               a)  The high negative correlation between “Halls” and “Sport” is due to chance alone.
               b)  Whenever there is a good investment in “Cine”, there is also a good investment
                   either in “Music” or in “Fine Arts”.
               c)  In the northern boroughs, a high investment in “Heritage” causes a low investment
                   in “Sport”.

           2.12 Consider the “Halls” variable of the Culture dataset:
               a)  Determine the overall frequency table and histogram, starting at zero and with bin
                   width 0.02.
               b)  Determine  the  mean and median. Which of these  statistics  should be used  as
                   location measure and why?
   94   95   96   97   98   99   100   101   102   103   104