Page 273 - Well Logging and Formation Evaluation

P. 273

Additional Mathematics Theory 263

The regression of y on x assumes that the x values in the data are always
correct and that the scatter occurs in the y variable. Similarly, the line of
regression of x on y may be derived simply by ﬁrst setting x = (1/a)*y +
(-b/a) and using equations 13.24–13.27 in an identical way.
A set of points on a plane may exhibit only a trend rather than a close
approximation to a straight line. The extent to which the points are lin-
early related is speciﬁed quantitatively by the correlation coefﬁcient. This
is given by:

r = a* s s yx (A4.28)

where s x , s y are the variances of the x and y values about their mean. In
the more general case where any function is used to describe y on x:

r = s xy ( s s ) (A4.29)
y
x
where s xy is the covariance of x and y given by:

-
-
s xy = ( S xm ) ( y m ) (A4.30)
x *
y n
and m x , m y are the means of the x and y values. The correlation coefﬁcient
will be 1 when the match is perfect between the model and the data, and
zero if there is no correlation. In practice, it is usually easiest to do cor-
relation within an Excel TM spreadsheet. A convenient way to do the ﬁtting,
where multiple variables and a nonlinear equation is being used is as
follows:
Set trial values of the relevant coefﬁcients in cells in the spreadsheet.
Using these coefﬁcients, calculate the model result (y¢) at all values of x
for which a y value is available for comparison. In a new column calcu-
2
late (y - y¢) for each data point At the bottom of this column create the
sum of all the values. The ﬁt will obviously be optimized when the set of
coefﬁcients is found that minimizes this sum. This set can be found auto-
matically within Excel TM using the Goal Seek TM function. Depending on
the complexity of the equation and number of variables, it may be neces-
sary to constrain the ranges of the coefﬁcients. Excel TM can also return the
correlation coefﬁcient.

268 269 270 271 272 273 274 275 276 277 278