Page 364 -
P. 364

354    CHAPTER 12  Automated data collection methods




                         way to resolve this dilemma is to use both approaches in the same study. Specifically,
                         data would be collected through both proxies and instrumented browsers (Obendorf
                         et al., 2007). Although this approach might be more expensive and time consuming
                         than either approach used independently, the resulting data may be of higher quality.
                            Another form of hybrid might combine automated data capture and analysis with
                         observation or other qualitative approaches (Chapter 11). As mentioned, log files
                         from web activities or instrumented software are limited in their ability to describe
                         the context of work. It is often difficult to go from the fine-grained detail of individ-
                         ual actions in a log file to a broader understanding of a user's goals and motivations.
                         If we combine log data with active observation by a human researcher, we stand a
                         better chance of understanding not just what the user was doing, but why she was
                         doing it. The observer might sit behind the subject, watching her activities and mak-
                         ing notes in real time, creating a log of observations that can be synchronized with
                         the events in the server log. Alternatively, video recordings allow for annotation and
                         observation at some later time. Log analysis studies involving remote users or those
                         not involved in a formal study (see the discussion of “A/B” testing in Chapter 14)
                         might be accompanied by an optional survey at the end of a session, asking users to
                         complete questions relating to their satisfaction with the system (see Chapter 5 for
                         more discussion of surveys). In any case, appropriate software can be used to view
                         individual user events alongside observer annotations and content, thus providing
                         a more detailed and informative picture than either source would give on its own.
                         Combinations of multiple log approaches with observer annotations can provide
                         even greater detail.



                         12.6  DATA MANAGEMENT AND ANALYSIS
                         12.6.1   HANDLING STORED DATA
                         Whenever you write or modify software to track user activities, you need to decide
                         how to manage the data. Two approaches are commonly used: log files and data-
                         bases. Log files are plain text files that indicate what happened, when it happened,
                         and other details—such as the user ID—that might help when interpreting data. Log
                         files are easy to write, but may require additional tools for interpretation. The com-
                         ments from Section 12.2 are generally applicable to any application logs, with one
                         important exception. As commonly available software tools that parse and interpret
                         standard web log formats may not be immediately applicable to logs that you might
                         develop for your software, you may need to dig in and develop custom tools for pars-
                         ing these log files.
                            Databases can be very useful for storing user activity information. Carefully de-
                         signed relational databases can be used to store each action of interest in one or more
                         database tables, along with all other relevant information. Powerful query languages,
                         such as SQL, can then be used to develop flexible queries and reports for interpreta-
                         tion of the data. This approach may be most useful when working with an applica-
                         tion that already connects to a relational database. When your tool uses a relational
   359   360   361   362   363   364   365   366   367   368   369