Page 94 - Becoming Metric Wise
P. 94
84 Becoming Metric-Wise
dependent variable. When the independent variable is time, the scatterplot
represents evolution of a variable over time.
It is possible to determine a best fitting line for a scatterplot. If the
independent variable is denoted as x, and the dependent one as y, then
this best fitting line has the equation
y 5 a 1 bx (4.12)
a is called the intercept and b is called the slope. When the slope is posi-
tive the line is increasing and when it is negative the line is decreasing.
When b is zero, y is constant. The best fitting line through the scatterplot
ðx i ; y i Þ is obtained as follows:
i
n
1 P x i y i 2 x:y
n
i51
n
b 5 (4.13)
1 P x 2 2 ðxÞ 2
n i
i51
and
a 5 y 2 b:x (4.14)
A best fitting line, often called a regression line, can always be calcu-
lated, even if the scatterplot has no linear appearance at all. For this rea-
son, a measure of the quality of the fit of the regression line to the
scatterplot is calculated. This measure is called the Pearson correlation
coefficient.
4.7.2 Pearson Correlation
The Pearson correlation coefficient is given as:
n
1 P x i y i 2 x:y
n
b:s x i51
rðx; yÞ 5 (4.15)
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s
5 s
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s y n n
2
1 P x 2 2 xðÞ 1 P y 2 2 yðÞ 2
n i n i
i51 i51
As standard deviations s x and s y are always positive, this equation
shows that the correlation coefficient and the slope of the regression line
have the same sign. Of course, neither the calculation of a regression line,
nor that of the correlation coefficient is usually done by hand. One uses a
software package or a pocket calculator.