Page 273 - Well Logging and Formation Evaluation
P. 273
Additional Mathematics Theory 263
The regression of y on x assumes that the x values in the data are always
correct and that the scatter occurs in the y variable. Similarly, the line of
regression of x on y may be derived simply by first setting x = (1/a)*y +
(-b/a) and using equations 13.24–13.27 in an identical way.
A set of points on a plane may exhibit only a trend rather than a close
approximation to a straight line. The extent to which the points are lin-
early related is specified quantitatively by the correlation coefficient. This
is given by:
r = a* s s yx (A4.28)
where s x , s y are the variances of the x and y values about their mean. In
the more general case where any function is used to describe y on x:
r = s xy ( s s ) (A4.29)
y
x
where s xy is the covariance of x and y given by:
-
-
s xy = ( S xm ) ( y m ) (A4.30)
x *
y n
and m x , m y are the means of the x and y values. The correlation coefficient
will be 1 when the match is perfect between the model and the data, and
zero if there is no correlation. In practice, it is usually easiest to do cor-
relation within an Excel TM spreadsheet. A convenient way to do the fitting,
where multiple variables and a nonlinear equation is being used is as
follows:
Set trial values of the relevant coefficients in cells in the spreadsheet.
Using these coefficients, calculate the model result (y¢) at all values of x
for which a y value is available for comparison. In a new column calcu-
2
late (y - y¢) for each data point At the bottom of this column create the
sum of all the values. The fit will obviously be optimized when the set of
coefficients is found that minimizes this sum. This set can be found auto-
matically within Excel TM using the Goal Seek TM function. Depending on
the complexity of the equation and number of variables, it may be neces-
sary to constrain the ranges of the coefficients. Excel TM can also return the
correlation coefficient.