Page 94 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 94

2.3 Summarising the Data   73


           2.3.6 Measures of Association for Nominal Variables
           Assume we have a multivariate dataset whose variables are of nominal type and we
           intend to measure their level of association. In this case, the correlation coefficient
           approach cannot be applied, since covariance and standard  deviations are not
           applicable to nominal data. We need another approach that uses the contingency
           table information in a similar way as when we computed the gamma coefficient for
           the ordinal data.


           Commands 2.11. SPSS, STATISTICA, MATLAB  and  R  commands used  to
           obtain measures of association for nominal variables.
             SPSS          Analyze; Descriptive
                           Statistics; Crosstabs
             STATISTICA    Statistics; Basic Statistics/Tables;
                           Tables and Banners; Options
             MATLAB        kappa(x,alpha)

             R             kappa(x,alpha)



           Measures of  association for  nominal  variables are  obtained in SPSS and
           STATISTICA as a result of applying contingency table analysis (see Commands
           5.7).
              The kappa statistic can be computed with SPSS only when the values of the first
           variable match the values of the second variable. STATISTICA does not provide
           the kappa statistic.
              MATLAB Statistics toolbox and R stats package do not provide a function for
           computing the kappa statistic. We provide, however, MATLAB and R functions
           for that purpose in the book CD (see Appendix F).



           2.3.6.1  The Phi Coefficient
           Let us first consider a bivariate dataset with nominal variables that only have two
           values (dichotomous variables), as in the case of the 2×2 contingency table shown
           in Table 2.11.
              In the case  of a full association of both variables one would obtain a 100%
           frequency for the values along the main diagonal of the table, and 0% otherwise.
           Based on this observation, the following index of association, φ  (phi coefficient),
           is defined:

                          ad −  bc
              φ =                        .                                 2.26
                   (a +  b )(c +  d )(a +  c )(b +  ) d
   89   90   91   92   93   94   95   96   97   98   99