Page 71 - Computational Statistics Handbook with MATLAB
P. 71

Chapter 3: Sampling Concepts                                     57


                             independent, then  ρ XY,  =  0  . Note that the converse of this statement does
                             not necessarily hold.
                              There are statistics that can be used to estimate these quantities. Let’s say
                                                                            ,
                                                                         (
                             we have a random sample of size n denoted as  X 1 Y 1 ) … X n Y n ),  ( ,  ,  . The
                             sample covariance is typically calculated using the following statistic
                                                            n
                                                ˆ       1
                                                                     (
                                                σ XY,  =  ------------ ∑ ( X –  X) Y –  Y)  .  (3.11)
                                                                       i
                                                                i
                                                       n –  1
                                                           i =  1
                             This is the definition used in the MATLAB function cov. In some instances,
                             the empirical covariance is used [Efron and Tibshirani, 1993]. This is similar
                             to Equation 3.11, except that we divide by n instead of n –  1  . The sample cor-
                             relation coefficient for two variables is given by

                                                          n
                                                         ∑ ( X i –  X) Y i –  Y)
                                                                  (
                                            ˆ            i =  1
                                            ρ XY,  =  -------------------------------------------------------------------------------------   .  (3.12)
                                                                ⁄
                                                                              ⁄
                                                     n      2   12   n  2   12
                                                    ∑ ( X –  X)    ∑ ( Y –  Y) 
                                                         i
                                                                       i
                                                                         
                                                    i =  1        i =  1
                             In the next example, we investigate the commands available in MATLAB that
                             return the statistics given in Equations 3.11 and 3.12. It should be noted that
                             the quantity in Equation 3.12 is also bounded below by  1–   and above by 1.

                             Example 3.2
                             In this example, we show how to use the MATLAB cov function to find the
                             covariance between two variables and the corrcoef function to find the
                             correlation coefficient. Both of these functions are available in the standard
                             MATLAB language. We use the cement data [Hand, et al., 1994], which were
                             analyzed by Hald [1952], to illustrate the basic syntax of these functions. The
                             relationship between the two variables is nonlinear, so Hald looked at the log
                             of the tensile strength as a function of the reciprocal of the drying time. When
                             the cement data are loaded, we get a vector x representing the drying times
                             and a vector y that contains the tensile strength. A scatterplot of the trans-
                             formed data is shown in Figure 3.1.
                                % First load the data.
                                load cement
                                % Now get the transformations.
                                xr = 1./x;
                                logy = log(y);
                                % Now get a scatterplot of the data to see if
                                % the relationship is linear.


                            © 2002 by Chapman & Hall/CRC
   66   67   68   69   70   71   72   73   74   75   76