Page 366 -
P. 366

356    CHAPTER 12  Automated data collection methods




                         to match the formats used by web servers. This mimicry would make your data ame-
                         nable to analysis by web-log analysis tools. Another possibility is to create parsing
                         and analysis software: if you can instrument your user interface to collect interaction
                         information, you will probably find this to be a reasonably manageable task.
                            More sophisticated questions might require fancier footwork. One common goal
                         is to study the sequence of events. Do users click Print before Save more frequently
                         than they click Save before Print? Similar challenges are found when trying to infer
                         the structure of interaction from web logs, leading to a variety of strategies that have
                         been used to pick out user “sessions” (Heer and Chi, 2002).
                            Another approach might be to visualize log files. Highly interactive visualiza-
                         tions might show each event in a log file as a point on the screen, while providing
                         tools for filtering and displaying data based on different criteria. As with other ap-
                         proaches for analyzing log files, visualization has been most widely used for web
                         logs. WebQuilt (Hong et al., 2001) displays pages and links between them as nodes
                         and links in a graph. Links are drawn as arrows, with thicker arrows indicating more
                         heavily used links and shading indicating the amount of time spent on a page before
                         selection of a link (Figure 12.11A). Users can zoom into a node to directly examine
                         the page in question (Figure 12.11B).
                            Other  visualizations  include  the  use of  two-dimensional  “starfield” displays
                         for viewing individual requests by date, time, and other attributes (Hochheiser and
                         Shneiderman, 2001) and finer-grained visualizations of mouse events on individual
                         pages (Arroyo et al., 2006; Atterer et al., 2006).
                            As with any other analysis, understanding your goals and planning your data ac-
                         quisition and analysis appropriately is key to effective use of these detailed logs. Ben-
                         Naim et al. describe the use of log analysis for an adaptive learning program, including
                         an explicit list of the questions involved and a description of the approaches used to
                         answer those questions (Ben-Naim et al., 2008). Appropriate storage of log data can
                         also facilitate analysis, with some researchers using business-oriented online analyti-
                         cal processing (OLAP) tools to drill down into relevant details (Mavrikis et al., 2015).
                            Data mining and machine learning techniques can be well suited for the extract-
                         ing patterns from log files. Relatively simple techniques such as association rules
                         (Agrawal et  al., 1993) might be used to determine patterns of frequently cooc-
                         curring accesses—for example, “sitemap” and “search” page accesses might fre-
                         quently be associated with clicks on a “contact us” page. Data mining approaches
                         have been used to inform site design and usage characterization from a variety of
                         perspectives, including personalization of content (Srivastava et al., 2000; Eirinaki
                         and Vazirgiannis, 2003) with results familiar to any web users who have seen web
                         pages including advertisements matching search terms that they have recently used.
                         Clustering techniques might also be used to develop models useful for clustering us-
                         ers into groups based on their usage patterns or predicting desirable outcomes such as
                         purchases. Although fascinating, and also relevant to the processing of physiological
                         data (Chapter 13) and ubiquitous computing data (Chapter 14), data mining is largely
                         beyond the scope of this book—for more information, see one of the many online
                         courses or textbooks on machine learning and data mining.
   361   362   363   364   365   366   367   368   369   370   371