Page 273 - Well Logging and Formation Evaluation
P. 273

Additional Mathematics Theory           263

               The regression of y on x assumes that the x values in the data are always
            correct and that the scatter occurs in the y variable. Similarly, the line of
            regression of x on y may be derived simply by first setting x = (1/a)*y +
            (-b/a) and using equations 13.24–13.27 in an identical way.
               A set of points on a plane may exhibit only a trend rather than a close
            approximation to a straight line. The extent to which the points are lin-
            early related is specified quantitatively by the correlation coefficient. This
            is given by:

               r = a*  s s yx                                         (A4.28)

            where s x , s y are the variances of the x and y values about their mean. In
            the more general case where any function is used to describe y on x:

               r =  s xy  ( s s )                                     (A4.29)
                           y
                        x
            where s xy is the covariance of x and y given by:


                                 -
                        -
               s xy = ( S  xm ) ( y m )                               (A4.30)
                           x *
                                    y n
            and m x , m y are the means of the x and y values. The correlation coefficient
            will be 1 when the match is perfect between the model and the data, and
            zero if there is no correlation. In practice, it is usually easiest to do cor-
            relation within an Excel TM  spreadsheet. A convenient way to do the fitting,
            where multiple variables and a nonlinear equation is being used is as
            follows:
               Set trial values of the relevant coefficients in cells in the spreadsheet.
            Using these coefficients, calculate the model result (y¢) at all values of x
            for which a y value is available for comparison. In a new column calcu-
                       2
            late (y - y¢) for each data point At the bottom of this column create the
            sum of all the values. The fit will obviously be optimized when the set of
            coefficients is found that minimizes this sum. This set can be found auto-
            matically within Excel TM  using the Goal Seek TM  function. Depending on
            the complexity of the equation and number of variables, it may be neces-
            sary to constrain the ranges of the coefficients. Excel TM  can also return the
            correlation coefficient.
   268   269   270   271   272   273   274   275   276   277   278