Page 55 -
P. 55
40 CHAPTER 2 Experimental research
or the rewards offered for participation. This phenomenon, called the “Hawthorne
effect,” was documented around 60 years ago (Landsberger, 1958). In many cases,
being observed can cause users to make short-term improvements that typically do
not last once the observation is over.
However, it should be noted that the context of the Hawthorne studies and
HCI-related experiments is significantly different (Macefield, 2007). First, the
Hawthorne studies were all longitudinal while most HCI experiments are not.
Secondly, all the participants in the Hawthorne studies were experts in the tasks
being observed while most HCI experiments observe novice users. Thirdly, the
Hawthorne studies primarily focused on efficiency while HCI experiments value
other important measures, such as error rates. Finally, the participants in the
Hawthorne study had a vested interest in a successful outcome for the study since
it was a point of contact between them and their senior management. In contrast,
most HCI studies do not carry this motivation. Based on those reasons, we believe
that the difference between the observed results of HCI experiments and the actual
performance is not as big as that observed in the Hawthorne studies. But still, we
should keep this potential risk in mind and take precautions to avoid or alleviate the
impact of the possible Hawthorne effect.
EMPIRICAL EVALUATION IN HCI
The validity of empirical experiments and quantitative evaluation in HCI
research has been doubted by some researchers. They argue that the nature
of research in HCI is very different from traditional scientific fields, such as
physics or chemistry, and, therefore, the results of experimental studies that
suggest one interface is better than another may not be truly valid.
The major concern with the use of empirical experiments in HCI is the control
of all possible related factors (Lieberman, 2007). In experiments in physics or
chemistry, it is possible to strictly control all major related factors so that multiple
experimental conditions are only different in the states of the independent
variables. However, in HCI experiments, it is very difficult to control all potential
factors and create experimental conditions that are exactly the same with the
only exception of the independent variable. For instance, it is almost impossible
to recruit two or more groups of participants with exactly the same age,
educational background, and computer experience. All three factors may impact
the interaction experience as well as the performance. It is argued that the use
of significance tests in the data analysis stage only provides a veneer of validity
when the potentially influential factors are not fully controlled (Lieberman, 2007).
We agree that experimental research has its limitations and deficiencies, just
as any other research method does. But we believe that the overall validity of
experimental research in the field of HCI is well-grounded. Simply observing
a few users trying two interfaces does not provide convincing results on the