Page 92 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 92

2.3 Summarising the Data   71


           A: The new variables NC and PRTGC can be computed using formulas similar to
           the formula used in  2.1.6 for computing PClass. Specifically for NC, given the
           values of the three N quartiles, 59 (25%), 78.5 (50%) and 95 (75%), respectively,
           NC coded in {0, 1, 2, 3} is computed as:

              NC = (N>59)+(N>78.5)+(N>95)

              The corresponding contingency table is shown in Table 2.10. Note that NC and
           PRTGC are ordinal variables since their ranks do indeed satisfy an order relation.
              The rank correlation coefficient computed for this table (see Commands 2.10) is
           0.715  which  agrees fairly well with the 0.72 correlation computed for the
           corresponding continuous variables, as shown in Table 2.9.


           2.3.5.2 The Gamma Statistic

           Another measure of association for ordinal variables is based on a comparison of
           the values of both variables, X and Y, for all possible pairs of cases (x, y). Pairs of
           cases can be:

              –  Concordant (in rank order): The values of both variables for one case are
                 higher (or are both lower) than the corresponding values for the other case.
                 For instance, in Table 2.10 (X = NC; Y = PRTGC), the pair {(0, 0), (2, 1)} is
                 concordant.
              –  Discordant (in rank order): The value of one variable for one case is higher
                 than the corresponding value for the other case, and the direction is reversed
                 for the other variable. For instance, in Table 2.10, the pair {(0, 2), (3, 1)} is
                 discordant.
              –  Tied (in rank order): The two cases have the same value on one or on both
                 variables. For instance, in Table 2.10, the pair {(1, 2), (3, 2)} are tied.

              The following γ measure of association (gamma coefficient) is defined:

                   ( P  Concordant ) −  ( P  Discordant )  ( P  Concordant ) −  ( P  Discordant )
              γ =                           =                           .  2.23
                          1−  ( P  Tied )      ( P  Concordant ) +  ( P  Discordant )

              Let P and Q represent the total counts for the concordant and discordant cases,
           respectively. A point estimate of γ is then:

                  P −  Q
              G =      ,                                                   2.24
                  P +  Q

           with P and Q computed from the counts n ij (of table cell ij), of a contingency table
           with r rows and c columns, as follows:
                                                    −
                                         r
                                          −1
                    −1
                    r
              P  =  ∑∑  c j −1  n  N ij +  ; Q  =  ∑∑ c j =1  2  n ij  N ,  2.25
                                          =
                        =1 ij
                                                    ij
                    =1
                    i
                                         i
   87   88   89   90   91   92   93   94   95   96   97