Page 96 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 96
2.3 Summarising the Data 75
with the highest frequency of occurrence (37.9%). In choosing this modal category,
we expect to be in error 62.1% of the times. On the other hand, if we know the sex
(i.e., we know the full table), we would choose as prediction outcome the “agree”
category if it is a male (expecting then 73.5 – 28 = 45.5% of errors), and the “fully
agree” category if it is a female (expecting then 26.5 – 11.4 = 15.1% of errors).
Let us denote:
i. Pe c ≡ Percentage of errors using only the columns = 100 – percentage of
modal column category.
ii. Pe cr ≡ Percentage of errors using also the rows = sum along the rows of (100
– percentage of modal column category in each row).
The λ measure (Goodman and Kruskal lambda) of proportional reduction of
error, when using the columns depending from the rows, is defined as:
Pe − Pe
λ cr = c cr . 2.27
Pe c
Similarly, for the prediction of the rows depending from the columns, we have:
Pe − Pe
λ = r rc . 2.28
rc
Pe r
The coefficient of mutual association (also called symmetric lambda) is a
weighted average of both lambdas, defined as:
average reduction in errors ( Pe − Pe ) + ( Pe − Pe )
λ = = c cr r rc . 2.29
average number of errors Pe + Pe r
c
The lambda measure always ranges between 0 and 1, with 0 meaning that the
independent variable is of no help in predicting the dependent variable and 1
meaning that the independent variable perfectly specifies the categories of the
dependent variable.
Example 2.10
Q: Compute the lambda statistics for Table 2.4.
A: Using formula 2.27 we find λ cr = 0.024, suggesting a non-helpful contribution
of the sex in determining the outcome of Q4. We also find λ rc = 0 and λ = 0.017.
The significance of the lambda statistic will be discussed in Chapter 5.
2.3.6.3 The Kappa Statistic
The kappa statistic is used to measure the degree of agreement for categorical
variables. Consider the cross table shown in Figure 2.19 where the r rows are