Page 96 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 96

2.3 Summarising the Data   75


           with the highest frequency of occurrence (37.9%). In choosing this modal category,
           we expect to be in error 62.1% of the times. On the other hand, if we know the sex
           (i.e., we know the full table), we would choose as prediction outcome the “agree”
           category if it is a male (expecting then 73.5 – 28 = 45.5% of errors), and the “fully
           agree” category if it is a female (expecting then 26.5 – 11.4 = 15.1% of errors).
              Let us denote:

              i.  Pe c ≡ Percentage of errors using only the columns = 100 – percentage of
                 modal column category.
              ii.  Pe cr ≡ Percentage of errors using also the rows = sum along the rows of (100
                 – percentage of modal column category in each row).

              The λ measure (Goodman and Kruskal lambda) of  proportional reduction of
           error, when using the columns depending from the rows, is defined as:

                   Pe −  Pe
              λ cr  =  c  cr  .                                            2.27
                      Pe c

              Similarly, for the prediction of the rows depending from the columns, we have:

                   Pe −  Pe
              λ  =   r    rc  .                                            2.28
               rc
                      Pe r

              The  coefficient of mutual association (also called  symmetric lambda) is a
           weighted average of both lambdas, defined as:

                  average  reduction  in  errors  ( Pe −  Pe )  + ( Pe −  Pe )
              λ =                       =   c    cr     r    rc  .         2.29
                  average  number  of  errors   Pe +  Pe r
                                                   c

              The lambda measure always ranges between 0 and 1, with 0 meaning that the
           independent variable is of no  help in predicting the dependent variable and  1
           meaning that the independent variable perfectly specifies the categories of the
           dependent variable.

           Example 2.10

           Q: Compute the lambda statistics for Table 2.4.
           A: Using formula 2.27 we find λ cr  = 0.024, suggesting a non-helpful contribution
           of the sex in determining the outcome of Q4. We also find λ rc  = 0 and λ = 0.017.
           The significance of the lambda statistic will be discussed in Chapter 5.



           2.3.6.3  The Kappa Statistic
           The  kappa statistic is used to measure the  degree of  agreement for categorical
           variables. Consider the cross table shown  in Figure 2.19 where the  r rows are
   91   92   93   94   95   96   97   98   99   100   101