Page 94 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 94
2.3 Summarising the Data 73
2.3.6 Measures of Association for Nominal Variables
Assume we have a multivariate dataset whose variables are of nominal type and we
intend to measure their level of association. In this case, the correlation coefficient
approach cannot be applied, since covariance and standard deviations are not
applicable to nominal data. We need another approach that uses the contingency
table information in a similar way as when we computed the gamma coefficient for
the ordinal data.
Commands 2.11. SPSS, STATISTICA, MATLAB and R commands used to
obtain measures of association for nominal variables.
SPSS Analyze; Descriptive
Statistics; Crosstabs
STATISTICA Statistics; Basic Statistics/Tables;
Tables and Banners; Options
MATLAB kappa(x,alpha)
R kappa(x,alpha)
Measures of association for nominal variables are obtained in SPSS and
STATISTICA as a result of applying contingency table analysis (see Commands
5.7).
The kappa statistic can be computed with SPSS only when the values of the first
variable match the values of the second variable. STATISTICA does not provide
the kappa statistic.
MATLAB Statistics toolbox and R stats package do not provide a function for
computing the kappa statistic. We provide, however, MATLAB and R functions
for that purpose in the book CD (see Appendix F).
2.3.6.1 The Phi Coefficient
Let us first consider a bivariate dataset with nominal variables that only have two
values (dichotomous variables), as in the case of the 2×2 contingency table shown
in Table 2.11.
In the case of a full association of both variables one would obtain a 100%
frequency for the values along the main diagonal of the table, and 0% otherwise.
Based on this observation, the following index of association, φ (phi coefficient),
is defined:
ad − bc
φ = . 2.26
(a + b )(c + d )(a + c )(b + ) d