Page 283 -
P. 283
272 CHAPTER 10 Usability testing
may include wireframes or paper prototypes, also known as low-fidelity prototypes
(Dumas and Fox, 2007). This type of usability testing is often more informal, with
more communication between test moderators and participants (Rubin and Chisnell,
2008). In early exploratory testing, there is more of a focus on how the user perceives
an interface component rather than on how well the user completes a task (Rubin and
Chisnell, 2008). Paper prototypes are especially useful, because they are low cost and
multiple designs can be quickly presented and evaluated by participants. In addition,
because paper prototypes involve little development time, designers and developers
tend not to become committed to a specific design early on. And users may feel more
comfortable giving feedback or criticizing the interface when they see that not much
work has been done yet on the interface. With fully functional prototypes, users may
be hesitant to criticize, since they feel that the system is already finished and their
feedback won’t matter that much. More information on paper prototyping can be
found in Snyder (2003).
Usability testing that takes place when there is a more formal prototype ready, when
high-level design choices have already been made, is known as a summative test. The
goal is to evaluate the effectiveness of specific design choices. These mostly functional
prototypes are also known as high-fidelity prototypes (Dumas and Fox, 2007).
Finally, a usability test sometimes takes place right before an interface is released
to the general user population. In this type of test, known as a validation test, the
new interface is compared to a set of benchmarks for other interfaces. The goal is to
ensure that, for instance, 90% of users can complete each task within 1 minute (if
that statistic is an important benchmark). Validation testing is far less common than
formative or summative testing.
It is important to note that there are variations in how usability testing is struc-
tured, regardless of the type of usability test or the stage of interface development.
So in general, the data collected in a validation test or summative test will tend to be
much more quantitative, and less focused on users “thinking aloud.” More formative
testing, on earlier prototypes, will tend to be more thinking aloud and qualitative
data. But none of these are 100% definite. With well-developed paper prototypes,
you theoretically could measure task performance quantitatively, and you could uti-
lize the thinking aloud protocol when an interface is fully developed. The key thing
to remember is that, the more that users “think aloud” and speak, the more that their
cognitive flow will be interrupted, and the longer time a task will take to complete
(Hertzum, 2016; Van Den Haak et al., 2003). It is also important to remember that,
at first, individual children participants involved in usability testing may not feel
comfortable criticizing an interface out loud (Hourcade, 2007), but pairs of children
doing usability testing may be more effective (Als et al., 2005). Usability testing is
flexible and needs to be structured around the activities that are most likely to result
in actual changes in the interface being evaluated.
Different authors use different definitions for these terms. For instance, we have
used the definitions from Rubin and Chisnell. West and Lehman, however, define for-
mative tests as those that find specific interface problems to fix and summative tests
as those that have a goal of benchmarking an interface’s usability to other similar