Page 95 - Becoming Metric Wise
P. 95
85
Statistics
Using the correlation coefficient, a best fitting line can be rewritten,
more symmetrically, as:
y 2 y x 2 x
5 rðx; yÞ (4.16)
s y s x
It can be shown that 21 # r(x,y) #1 1. If r is about zero there is no
linear relation between the variables x and y.If r is close to 21or 11
there is a strong linear relation. For values between 0 and 1 the linear
relation is more or less strong and positive; for values between 21 and 0
the linear relation is similarly more or less strong and negative. Note that
it is possible that a weak linear relation corresponds to a strong nonlinear
relation, e.g., an exponential one.
4.7.3 Spearman Correlation
The Pearson correlation coefficient measures a linear relation and can be
highly sensitive to outliers. In such cases one prefers the Spearman corre-
lation, which is a robust measure of association. It is determined by rank-
ing each of the two groups (from largest to smallest or vice versa, this
does not matter). In case of ties, an average rank is used. The Spearman
correlation coefficient is then calculated in exactly the same way as the
Pearson correlation, but using ranks instead of the real observations. Also,
the interpretation of the Spearman correlation differs from Pearson’s.
Pearson correlation coefficient is a measure of linearity, while Spearman’s
is a measure of monotonicity i.e., it determines whether or not the order
between the variables is preserved. Of course, a perfect linear relation is
monotone, but the opposite does not hold.
It can be shown that the Spearman rank correlation coefficient R S can
be calculated as:
n
P 2
6 d
i
R S 5 1 2 i51 (4.17)
2
nðn 2 1Þ
where d i denotes the difference in ranking for the ith item and n is the
number of items studied.
4.8 NONPARAMETRIC LINEAR REGRESSION
Nonparametric linear regression is a distribution-free method for investigat-
ing a linear relationship between two variables Y (dependent, outcome) and