Page 50 -
P. 50
2.4 Significance tests 35
D each represents a condition. Each row represents a sequence of four conditions
to which one participant can be randomly assigned. Note that each condition only
appears once in each row and column, suggesting that the order of the conditions is
completely counter balanced for these four participants.
2.4 SIGNIFICANCE TESTS
2.4.1 WHY DO WE NEED THEM?
Almost all experimental investigations are analyzed and reported through signifi-
cance tests. If you randomly pick up an HCI-related journal article or a conference
paper, it is very likely that you will encounter statements similar to the following:
On average, participants performed significantly better (F(1,25) = 20.83, p < 0.01)
… in the dynamic peephole condition … rather than the static peephole condition.
(Mehra et al., 2006)
A t test showed that there was a significant difference in the number of lines of
text entered (t(11) = 6.28, p < 0.001) with more entered in the tactile condition.
(Brewster et al., 2007)
Why do you need to run significance tests on your data? What is wrong with the
approach of comparing two mean values of error rate and then claiming that the ap-
plication with the lower mean value is more accurate than the other application? Here
we encounter a fundamental issue in statistics that has to be clarified in order to un-
derstand the numerous concepts, terms, and methods that will be discussed in the rest
of this chapter and in Chapters 4 and 5. Let us consider the following two statements:
1. Mike's height is 6′2″. Mary's height is 5′8″. So Mike is taller than Mary.
2. The average height of three males (Mike, John, and Ted) is 5′5″. The average height
of three females (Mary, Rose, and Jessica) is 5′10″. So females are taller than males.
It should not be difficult for you to tell that the first statement is correct while the
second one is not. In the first statement, the targets being compared are the heights
of two individuals, both known numbers. Based on the two numbers, we know that
Mike is taller than Mary. This is simple to understand, even for a child. When the
values of the members of the comparison groups are all known, you can directly
compare them and draw a conclusion. No significance test is needed since there is
no uncertainty involved.
What is wrong with the second statement? People may give various responses to
this question, such as:
• Well, by common sense, I know males are generally taller than females.
• I can easily find three other males and three other females, in which the average
height of the three males is higher than that of the three females.
• There are only three individuals in each group. The sizes of the comparison
groups are too small.
• The individuals in both the male group and the female group are not
representative of the general population.