Page 344 -
P. 344

334    CHAPTER 12  Automated data collection methods




                         including keywords extracted from visited web pages or URLs and page view time can
                         provide increased accuracy in characterizing user sessions (Heer and Chi, 2002).
                            As a stand-alone tool, web log analysis is limited by a lack of contextual knowl-
                         edge about user goals and actions. Even if we are able to extract individual user paths
                         from log files, these paths do not tell us how the path taken relates to the user's goals.
                         In some cases, we might be able to make educated guesses: a path consisting of re-
                         peated cycling between “help” and “search” pages is most likely an indication of a
                         task not successfully completed. Other session paths may be more ambiguous: long
                         intervals between page requests might indicate that the user was carefully reading
                         web content, but they can also arise from distractions and other activity not related to
                         the website under consideration. Additional information, such as direct observation
                         through controlled studies or interviews, may be necessary to provide appropriate
                         context (Hochheiser and Shneiderman, 2001).
                            Complex web applications can be designed to generate and store additional data
                         that may be useful for understanding user activity. Database-driven websites can
                         track views of various pages, along with other actions such as user comments, blog
                         posts, or searches. Web applications that store this additional data are very similar to
                         “instrumented” applications—programs designed to capture detailed records of user
                         interactions and other relevant activities (Section 12.4.1).
                            The analysis of web log information presents some privacy challenges that must
                         be handled appropriately. IP numbers that identify computers can be used to track
                         web requests to a specific computer, which may be used by a single person. Analyses
                         that track blog posts, comments, purchases, or other activity associated with a user
                         login can also be used to collect a great deal of potentially sensitive information.
                         Before collecting any such data, you should make sure that your websites have pri-
                         vacy policies and other information explaining the data that you are collecting and
                         how you will use it. Additional steps that you might take to protect user privacy
                         include taking careful control of the logs and other repositories of this data, report-
                         ing information only in aggregate form (instead of in a form that could identify in-
                         dividuals), and destroying the data when your analysis is complete. As these privacy
                         questions may raise concerns regarding informed consent and appropriate treatment
                         of research participants, some web log analyses might require approval from your
                         institutional review board (see Chapter 15).
                            Web server logs have been the subject of many research studies over the years.
                         The development of visualization tools to interpret these logs has been a recurring
                         theme since the 1990s and continuing on to more recent work (Pirolli and Pitkow,
                         1999; Hochheiser and Shneiderman, 2001; Malik and Koh, 2016). Web search logs,
                         particularly from search engines, have proven to be a particularly fruitful data source
                         for studying how users conduct searches and interpret results (White, 2013; White
                         and Hassan, 2014), particularly for specific tasks such as searching for medical infor-
                         mation (White and Horvitz, 2009). For more on the use of web search logs to study
                         user behavior, see Chapter 14. As is often the case, web log analysis studies often
                         use multiple complementary datasets to confirm and complement log data. A study
                         of the social network Google+ combined log analysis with surveys and interviews to
   339   340   341   342   343   344   345   346   347   348   349