Page 340 -
P. 340
330 CHAPTER 12 Automated data collection methods
FIGURE 12.1
Computerized data collection systems present a trade-off between power and ease of
implementation and use.
but you may need to digger deeper in the literature to find similar work for more spe-
cific guidance on appropriate data granularity, data cleaning, and analytic techniques.
In this chapter, we focus on log and data capture. This is certainly not the full
story of the use of automated data capture in HCI. Newer technologies such as smart-
phones and a huge variety of inexpensive sensors provide rich troves of data suitable
for understanding how we interact with computers in a wide variety of environments.
These applications will be discussed in Chapter 13 on Human Data collection and
Chapter 14 on online and ubiquitous HCI research.
12.2 EXISTING TOOLS
Many commonly used software tools collect and store data that can be used in HCI
research. These tools have the obvious appeal of relative simplicity: although some
effort may be required for analysis, data collection tools may be readily available.
For some widely analyzed data sources—such as web server logs—commercial and
freely available tools can provide substantial assistance in interpretation.
These advantages do not come without a cost. Using unmodified, commodity soft-
ware is likely to limit you to data that is collected by default. If your research questions
require additional data, you may be out of luck. This is often not a real barrier—many
successful research projects have been based on analysis of data from available software.
A sound strategy might be to start with these tools, pushing them to see how far they can
take your research efforts and moving toward more complex measures if needed.
12.2.1 WEB LOGS
Web servers, email servers, and database servers all generate log files that store
records of requests and activity. As a sequential listing of all of the requests made to
a server, a log file provides a record of how the server has been used and when. This
detailed information can be useful for evaluating system performance, debugging
problems, and recovering from crashes.