Page 100 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 100

Exercises   79


           2.13 Determine the box plots of the Breast Tissue   variables I0 through PERIM, for the
               6 classes of breast tissue. By visual inspection of the results, organise a table describing
               which class discriminations can be expected to be well accomplished by each variable.

           2.14 Consider the two variables MH = “neonatal mortality rate at home” and MI = “neonatal
               mortality rate at Health Centre” of the Neonatal dataset. Determine the histograms
               and compare both variables according to the skewness and kurtosis.

           2.15 Determine the scatter plot and correlation coefficient of the MH and MI variables of the
               previous exercise. Comment on the results.

           2.16 Determine the histograms, skewness and kurtosis of the BPD, CP and AP variables of
               the Foetal W eight   dataset. Which variable is better suited to normal modelling?
               Why?

           2.17 Determine the correlation matrix of the  BPD, CP and AP variables of the previous
               exercise. Comment on the results.

           2.18 Determine  the correlation between variables I0  and HFS of the  Breast Tissue
               dataset. Check with the scatter plot that the very low correlation of those two variables
               does not mean that there is no relation between them. Compute the new variable I0S =
                       2
               (I0 – 1235)  and show that there is a significant correlation between this new variable
               and HFS.

           2.19 Perform the following statistical analyses on the Rocks’ dataset:
               a)  Determine the histograms, skewness and kurtosis of the variables and categorise
                   them into the following categories: left asymmetric; right asymmetric; symmetric;
                   symmetric and almost normal.
               b)  Compute the correlation matrix for the mechanical test variables and comment on
                   the high correlations between RMCS and RCSG and between AAPN and PAOA.
               c)  Compute the correlation matrix for the chemical composition variables and
                   determine which variables have higher  positive and negative correlation with
                   silica (SiO 2 ) and which variable has higher positive correlation with titanium
                   oxide (TiO 2 ).

           2.20 The student performance in a first-year university course on Programming can be partly
               explained by previous knowledge on such matter. In order to assess this statement, use
               the SCORE and PROG variables of the  Programming   dataset, where the first
               variable represents the final examination score on Programming (in [0, 20]) and the
               second variable categorises  the previous knowledge. Using three SCORE  categories
               – Poor, if SCORE<10, Fair if 10 ≤SCORE< 15, and Good if SCORE≥ 15 −, determine:
               a)  The Spearman correlation between the two variables.
               b)  The contingency table of the two variables.
               c)  The gamma statistic.

           2.21 Show examples of 2×2 contingency tables for nominal data corresponding to φ  = 1, −1,
               0 and to λ, λ rc  and λ cr  = 1 and 0.
   95   96   97   98   99   100   101   102   103   104   105