Page 40 - Industrial Process Plant Construction Estimating and Man Hour Analysis
P. 40
10 Industrial process plant construction estimating and man-hour analysis
1.5 Correlation
1.5.1 Correlation coefficient
(1) Measures the strength and direction of a linear relationship between two
variables.
(2) Mathematical formula for computing r is
X X X 2 X 2
X X 2 X 2 1
r ¼ n XY X Y = n X X n Y Y 2 :
(3) The value of r is such that ( 1<¼r<¼+1). The + and signs are used for
positive and negative linear correlations, respectively.
(4) Positive correlation: If x and y have a strong positive linear correlation, r is
close to +1. An r valve exactly +1 indicates a perfect positive fit. Positive
values indicate a relationship between X and Y variables such that as
values for X increase, values for Y also increase.
(5) A perfect correlation of + or 1 occurs only when all the data points all lie
exactly on a straight line. If r¼+1, the slope line is positive. If r¼ 1, the
slope line is negative.
(6) A correlation greater than 0.8 is generally described as strong, whereas a
correlation less than 0.5 is generally described as weak.
1.5.2 Coefficient of determination, r 2
(1) It is useful because it gives the proportion of variance of one variable that is
predictable from the other variable. It is a measure that allows us to deter-
mine how certain one can be of making predictions from a certain model/
graph.
(2) The coefficient of determination is the ratio of the explained variation to
the total variation.
2
(3) The coefficient of determination is such that (o< ¼r < ¼1) and denotes
the strength of the linear association between X and Y.
(4) The coefficient of determination represents the percent of the data that is
2
the closest to the line of best fit. For example, if r¼0.922, then r ¼0.850,
which means that 85% of the total variation in Y can be explained by the
linear relationship between X and Y (as described by the regression equa-
tion). The other 15% of the total variation in Y remains unexplained.
(5) The coefficient of determination is a measure of how well the regression
line represents the data. If the regression line passes exactly through every
point on the scatter plot, it would explain all of the variation. The further
the line is away from the points, the less it is able to explain.