Page 104 -
P. 104
90 CHAPTER 4 Statistical analysis
correlated does not necessarily mean that the changes in one variable cause the
changes in the other variable. In some cases, there is causal relationship between the
two variables. In other cases, there is a hidden variable (also called the “intervening”
variable, which is one type of confounding variable) that serves as the underlying
cause of the change.
For example, in an experiment that studies how users interact with an e- commerce
website, you may find a significant correlation between income and performance.
More specifically, participants with higher income spend longer time finding a
specific item and make more errors during the navigation process. Can you claim
that earning a higher income causes people to spend longer time retrieving online
items and make more errors? The answer is obviously no. The truth might be that
people who earn a higher income tend to be older than those who earn a lower in-
come. People in the older age group do not use computers as intensively as in the
younger age group, especially when it comes to activities such as online shopping.
Consequently, they may spend longer time to find items and make more errors. In
this case, age is the intervening variable that is hidden behind the two variables ex-
amined in the correlation. Although income and performance are significantly cor-
related, there is no causal relationship between them. A correct interpretation of the
relationship between the variables is listed in Figure 4.2.
Income
Age
Less experience in Lower
online purchase performance
FIGURE 4.2
Relationship between correlated variables and an intervening variable.
This example demonstrates the danger of claiming causal relationship based
on significant correlation. In data analysis, it is not uncommon for researchers to
conduct pairwise correlation tests on all variables involved and then claim that
“variable A has a significant impact on variable B” or “the changes in variable
A cause variable B to change,” which can be spurious in many cases. To avoid
this mistake, you should keep in mind that empirical studies should be driven
by hypothesis, not data. That is, your analysis should be based on a predefined
hypothesis, not the other way around. In the earlier example, you are unlikely to
develop a hypothesis that “income has a significant impact on online purchas-
ing performance” since it does not make much sense. If your study is hypothesis
driven, you will not be fooled by correlation analysis results. On the other hand,
if you do not have a clearly defined hypothesis before the study, you will derive
hypotheses driven by the data analysis, making it more likely that you will draw
false conclusions.