Page 364 -
P. 364
354 CHAPTER 12 Automated data collection methods
way to resolve this dilemma is to use both approaches in the same study. Specifically,
data would be collected through both proxies and instrumented browsers (Obendorf
et al., 2007). Although this approach might be more expensive and time consuming
than either approach used independently, the resulting data may be of higher quality.
Another form of hybrid might combine automated data capture and analysis with
observation or other qualitative approaches (Chapter 11). As mentioned, log files
from web activities or instrumented software are limited in their ability to describe
the context of work. It is often difficult to go from the fine-grained detail of individ-
ual actions in a log file to a broader understanding of a user's goals and motivations.
If we combine log data with active observation by a human researcher, we stand a
better chance of understanding not just what the user was doing, but why she was
doing it. The observer might sit behind the subject, watching her activities and mak-
ing notes in real time, creating a log of observations that can be synchronized with
the events in the server log. Alternatively, video recordings allow for annotation and
observation at some later time. Log analysis studies involving remote users or those
not involved in a formal study (see the discussion of “A/B” testing in Chapter 14)
might be accompanied by an optional survey at the end of a session, asking users to
complete questions relating to their satisfaction with the system (see Chapter 5 for
more discussion of surveys). In any case, appropriate software can be used to view
individual user events alongside observer annotations and content, thus providing
a more detailed and informative picture than either source would give on its own.
Combinations of multiple log approaches with observer annotations can provide
even greater detail.
12.6 DATA MANAGEMENT AND ANALYSIS
12.6.1 HANDLING STORED DATA
Whenever you write or modify software to track user activities, you need to decide
how to manage the data. Two approaches are commonly used: log files and data-
bases. Log files are plain text files that indicate what happened, when it happened,
and other details—such as the user ID—that might help when interpreting data. Log
files are easy to write, but may require additional tools for interpretation. The com-
ments from Section 12.2 are generally applicable to any application logs, with one
important exception. As commonly available software tools that parse and interpret
standard web log formats may not be immediately applicable to logs that you might
develop for your software, you may need to dig in and develop custom tools for pars-
ing these log files.
Databases can be very useful for storing user activity information. Carefully de-
signed relational databases can be used to store each action of interest in one or more
database tables, along with all other relevant information. Powerful query languages,
such as SQL, can then be used to develop flexible queries and reports for interpreta-
tion of the data. This approach may be most useful when working with an applica-
tion that already connects to a relational database. When your tool uses a relational