Page 369 -
P. 369
12.8 Challenges of computerized data collection 359
However, effective use of these tools requires addressing several important chal-
lenges. As with any method for data collection, automated methods work best if their
use is carefully considered in the context of the specific situation that is being studied
and the research questions that are being asked. Before collecting data—or design-
ing a system to collect data—you should ask yourself what you hope to learn from
your research and how the data that you collect will help you answer those questions.
Collecting too much data, too little data, or the wrong data will not be particularly
helpful.
Computer use can be considered over a wide range of time scales, with vastly
different interpretations. At one end of the spectrum, individual keyboard and mouse
interactions can take place as frequently as 10 times per second. At the other extreme,
repeated uses of information resources and tools in the context of ongoing projects
may occur over the course of years (Hilbert and Redmiles, 2000). Successful experi-
ments must be designed to collect data that is appropriate for the questions being
asked. If you want to understand usage patterns that occur over months and years,
you probably do not want to collect every mouse event and key click; the volume
of data will be simply overwhelming. Similarly, understanding dynamics of menu
choices with specific applications requires more detailed information than simply
which applications were used and when.
The amount of data collected is generally referred to as the granularity or reso-
lution of the data. Fine-grained, high-resolution data involves every possible user
interaction event; coarse-grained, low-resolution data contains fewer events, perhaps
involving specific menu items, selection of specific buttons, or interaction with spe-
cific dialog boxes.
The specificity of the questions that you are asking may help determine the gran-
ularity of data that you need to collect. Many experiments involve structured ques-
tions regarding closed tasks on specific interfaces: which version of a web-based
menu layout is better? To support these studies, automated data collection tools must
collect data indicating which links are clicked, and when (see the Simultaneous vs
Sequential Menus sidebar for an example). Web server logs are very well suited for
such studies.
Open-ended studies aimed at understanding patterns of user interactions may
pose greater challenges. To study how someone works with a word processor, we
may need to determine which functions are used and when. Activity loggers that
track individual mouse movements and key presses may help us understand some
user actions, but they do not record structured, higher-order details that may be nec-
essary to help understand how these individual actions come together to involve the
completion of meaningful tasks. Put another way, if we want to understand sequences
of important operations in word-processing tasks, we do not necessarily want a list of
keystrokes and mouse clicks. Instead, we would like to know that the user formatted
text, inserted a table, and then viewed a print preview. Still higher-level concepts are
even harder to track: how do we know when a user has completed a task?
Researchers have tried a variety of approaches for inferring higher-level tasks
from low-level interaction events. Generally, these approaches involve combining