Page 220 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 220
5.3 Inference on Two Populations 201
5.3.1 Tests for Two Independent Samples
Commands 5.8. SPSS, STATISTICA, MATLAB and R commands used to
perform non-parametric tests on two independent samples.
SPSS Analyze; Nonparametric Tests;
2 Independent Samples
STATISTICA Statistics; Nonparametrics; Comparing two
independent samples (groups)
MATLAB [p,h,stats]=ranksum(x,y,alpha)
R ks.test(x,y) ;
wilcox.test(x,y) | wilcox.test(x~y)
5.3.1.1 The Kolmogorov-Smirnov Two-Sample Test
The Kolmogorov-Smirnov test is used to assess whether two independent samples
were drawn from the same population or from populations with the same
distribution, for the variable X being tested, which is assumed to be continuous. Let
F(x) and G(x) represent the unknown distributions for the two independent
samples. The null hypothesis is formalised as:
H 0: Data variable X has equal cumulative probability distributions for the two
samples: F (x) = G(x).
The test is conducted similarly to the way described in section 5.1.4. Let S m(x)
and S n(x) represent the empirical distributions of the two samples, with sizes m and
n, respectively. We then use as test statistic, the maximum deviation of these
empirical distributions:
D m,n = max | S n(x) – S m(x) |. 5.29
For large samples (say, m and n above 25) and two-tailed tests (the most usual),
the significance of D m,n can be evaluated using the critical values obtained with the
expression:
m + n
c , 5.30
mn
where c is a coefficient that depends on the significance level, namely c = 1.36 for
α = 0.05 (for details, see e.g. Siegel S, Castellan Jr NJ, 1998).
When compared with its parametric counterpart, the t test, the Kolmogorov-
Smirnov test has a high power-efficiency of about 95%, even for small samples.